TableExtractNet: A Model of Automatic Detection and Recognition of Table Structures from Unstructured Documents
Abstract
:1. Introduction
- Enhancement of Existing Methods: The study of the approach in [4] offers valuable insights into the state-of-the-art techniques for table detection and recognition. By analyzing the limitations of these methods, we introduced novel improvements and modifications to refine and optimize the existing techniques.
- Development of Innovative Techniques and Algorithms: New algorithms for table detection and structure recognition were introduced, including the integration of CornerNet and Faster R-CNN with ResNet-18 backbones for table detection. For table structure recognition, a novel Split-and-Merge Module with ResNet-18 and FPN backbone, utilizing Spatial and Grid CNNs, was designed to ensure accurate segmentation and reassembly of complex tables. These algorithms dynamically adapt to varying document layouts, font styles, and image quality, enhancing precision and robustness in table detection.
- Improvements in Table Detection and Layout Handling: Enhancements were made to effectively manage multi-line text within table cells, particularly in densely populated tables, ensuring accurate interpretation across diverse table designs. These improvements address the challenges identified in [4], where previous methods struggled with closely positioned tables and cells containing multi-line content. Specifically, algorithms were refined to better parse and segment complex table layouts.
- Improvement of an Advanced Table Interpretation System: An advanced system was developed to efficiently handle the complexities of tables with multiple structural variations, enabling improved data extraction and analysis. This system overcomes the limitations of previous approaches, particularly the difficulty in processing extremely dense tables with overlapping segmentation masks. The new system incorporates enhanced parsing techniques to precisely delineate table boundaries and cell contents.
- Extensive experimentation and performance evaluation of the proposed techniques and algorithms were conducted. The proposed solutions outperformed state-of-the-art methods, particularly on tables with varying backgrounds, text typefaces, and line colors. This ensures the generalizability of our technique. The experiments also demonstrated the robustness of the adjusted loss functions in terms of table detection and recognition accuracy.
2. Related Work
2.1. Table Detection
2.2. Table Structure Recognition
2.3. Quality Assessment Metrics
2.4. Research Gap
3. Methods
3.1. Table Detection
- Output Shape:The bounding box for each detected table is represented as:For multiple tables, the coordinate system can be expressed as:If two tables are detected, the output could be:This defines the spatial boundaries of each table using the bounding box coordinates and , ensuring accurate localization for subsequent table recognition processes.
- Stage 1: CornerNet: Acts as a region proposal network. It uses a deep convolutional neural network to predict heatmaps for the top-left and bottom-right corners of tables. These corners are then paired using an embedding technique that helps in grouping corners belonging to the same table. The network utilizes a novel corner pooling mechanism to capture explicit boundary information and predicts offsets to compensate for any discretization errors.
- Stage 2: Fast R-CNN: Refines the proposals and performs the final detection. After the proposals are generated by CornerNet, Faster R-CNN refines them by classifying each proposal as a table or non-table and adjusting the bounding box coordinates for precise localization.
- Sub-Stages for Detection: The model operates in two sub-stages for detecting top-left and bottom-right corners separately.
- Dilated Convolutions: These are employed to capture spatial hierarchies and broader context without losing resolution.
- Corner Pooling: A unique mechanism that ensures that the corner predictions are robust and accurate, even with complex table layouts.
- ROI Align: This process ensures that features extracted from each proposed region are accurately aligned with the region of interest, improving the quality of the subsequent classification and bounding box regression.
- Classification and Bounding Box Regression: Determines the presence of a table within each region proposal and fine-tunes the coordinates of the bounding boxes to snugly enclose the tables.
3.2. Table Structure Recognition (TSR)
- Output Representation: The TSR output can be visualized as a matrix where each element corresponds to a cell in the table. The cells are defined by their bounding boxes and possibly their content.
- Output Shape: For a table with M rows and N columns, the TSR output shape could be represented as a matrix.
- Output Example: For a simple table shown in Figure 5 with 7 rows and 3 columns, the TSR output could be represented as:
3.3. Loss Function
- Table Detection: For the table detection framework, the loss function is defined in two parts. The first part, , is a combination of classification and regression losses. The classification loss evaluates the predicted corner points against the ground truth, normalized by the number of tables, , and the regression loss refines the position of these points, normalized by the number of corners per batch, . The second part, , is determined by the classification of region proposals as table or non-table, divided by the number of proposals, N, and the regression of the coordinates for these proposals, normalized by the number of foreground proposals, . The overall loss for detecting tables is a weighted sum of these two components. To improve the model’s performance beyond the original framework in [4], we propose adding Intersection over Union (IoU) loss and Generalized Intersection over Union (GIoU) loss to enhance bounding box regression and localization. These additional losses help in accurately localizing the table boundaries by addressing the issues of overlapping and non-overlapping bounding boxes.The loss function for the corner detection, , is given by:The loss for the Faster R-CNN module, , is defined as:To further improve the model’s performance, we include the following loss functions:Explanation of Variables:
- : The area of intersection between the predicted bounding box and the ground truth bounding box. This represents the overlapping region of the two boxes.
- : The area of union, which is the total area covered by both the predicted and ground truth bounding boxes. It includes the overlapping and non-overlapping areas.
- : The area of the enclosing box, which is the area of the smallest box that can completely enclose both the predicted and ground truth bounding boxes.
The improved total loss for the table detector is:The chosen values for are based on empirical tuning during model training. Higher values for , , and reflect the importance of accurate corner detection, region proposal classification, and bounding box localization, respectively. A smaller value for is used to balance the effect of this term, as GIoU provides a regularization factor for non-overlapping bounding boxes but has a less direct impact on overlap.Hyperparameter Tuning and the Effects of Poor Choices: The chosen values are hyperparameters that have been optimized through extensive experimentation. Poorly chosen values for can have significant negative effects on the model’s performance. For instance:- Overemphasis on IoU or GIoU: If or is set too high, the model may focus excessively on bounding box overlap, which can lead to slower convergence or difficulty in distinguishing closely packed tables.
- Imbalanced Loss Contributions: If the values of or are too small, the model may underperform in identifying table corners or classifying region proposals, resulting in lower detection accuracy.
- Difficulty in Training: Improperly chosen values can also make the model harder to train, as certain parts of the loss function may dominate the training process, leading to instability or slow convergence.
To mitigate these issues, a grid search was conducted to find an optimal balance between the different loss components. The selected values represent the best trade-off between accurate table detection and fast model convergence. - Table Structure Recognition: For Spatial CNN-Based Separation Line Prediction in Table Structure Recognition, the process is divided into two separate pathways: one designated for the separation of rows and the other for columns, with each pathway having its unique loss calculation. The overall loss for the component is determined by taking the average of the losses across the sampled pixel count for rows and columns . The labels for the row and column delineation predicted by the model and those that are actual (ground truth) are denoted by and respectively. The loss function calculates the binary cross-entropy between the predicted probability for a row separator at each sampling pixel and its ground truth label. To improve segmentation accuracy and handle class imbalance, we propose incorporating Dice loss and Focal loss. These additional losses help in accurately segmenting the table structures and addressing the issues of class imbalance in the training data.The specific loss function for predicting separation lines is formulated accordingly:Grid CNN-Based Cell Merging: Here, denotes the number of relational pairs selected for merging cells. is the binary cross-entropy loss between the predicted and ground truth labels for the i-th relational pair, reflecting the model’s accuracy in predicting whether two cells should be merged:The loss for the cell merging module is thus given by:The definitions for additional losses are:Total Loss for Table Structure Recognizer:The weighting values are:These values were selected based on experiments during model training. A higher weight on and indicates the importance of predicting separation lines and correctly merging cells. Lower values for and reflect their role as regularization terms for segmentation accuracy and addressing class imbalance.Hyperparameter Tuning and Sensitivity: Similar to table detection, the values for Table Structure Recognition were fine-tuned through a series of experiments. Poor choices for these values can negatively impact the system:
- Overemphasis on Focal Loss: Setting too high may lead the model to focus too much on difficult examples, causing slower convergence.
- Unbalanced Loss Weights: Poor tuning of or may cause the system to inaccurately delineate rows and columns, leading to poor table structure recognition.
We performed extensive tuning through grid search to ensure that the loss components are balanced and the model performs well on both row/column separation and cell merging tasks.
4. Experimental Results and Discussion
- Precision (P) measures the proportion of true positive predictions among all positive predictions made. It is calculated using the formula:
- Recall (R) represents the proportion of true positive predictions relative to the total number of actual positive instances. It is calculated as:
- F1-score (): Precision and recall are balanced by taking the harmonic mean of the two measures. The formula is as follows:
- Average Precision () calculates the weighted mean of precisions at each threshold, summarizing the precision-recall curve. The weight assigned to each threshold represents the change in recall from the previous level. () is defined as:This approach allows to be calculated by interpolating the precision at each recall level, rather than relying solely on a fixed set of 11 evenly spaced points. This method provides a more accurate and fine-grained assessment of the model’s performance across all thresholds.
4.1. Implementation Details
4.2. Table Detection
4.3. Table Structure Recognition
4.4. Limitations
4.4.1. Error Analysis
4.4.2. Handling Complex Cases
4.5. Ablation Studies
4.5.1. Table Detection with CornerNet + Faster R-CNN Module
Ablation on Loss Functions for Table Detection
- Without IoU and GIoU Losses
- −
- On the IIIT-AR-13K dataset: Precision decreased by 2.5%, recall by 3.8%, and F1-score by 3.1%. The model’s ability to localize table boundaries, especially for overlapping tables, was significantly reduced.
- −
- On the STDW dataset: Precision decreased by 3.1%, recall by 4.2%, and F1-score by 3.6%. Complex table layouts with small, irregular tables were less accurately detected without the improvements from IoU and GIoU.
- Loss Weighting () Impact:
- −
- When lowering and , the model’s performance degraded significantly in terms of bounding box overlap, indicating that these components are essential for accurate table localization.
Ablation on CornerNet and Faster R-CNN
- Removing CornerNet:
- −
- On the IIIT-AR-13K dataset: Precision decreased by 4.8%, recall by 4.1%, and F1-score by 4.2%. Without CornerNet, the model struggled to detect tables with irregular layouts, leading to misclassification and missed table corners.
- −
- On the STDW dataset: Precision decreased by 4.5%, recall by 5.3%, and F1-score by 4.9%. The highly diverse table formats in this dataset made it difficult for the model to accurately detect smaller tables.
- Removing Faster R-CNN:
- −
- On the IIIT-AR-13K dataset: Precision decreased by 3.2%, recall by 2.3%, and F1-score by 3.5%. The absence of Faster R-CNN affected region proposal refinement, leading to more false positives and difficulties in detecting complex layouts.
- −
- On the STDW dataset: Precision decreased by 3.4%, recall by 3.6%, and F1-score by 3.2%. Without Faster R-CNN, the model performed poorly on complex table formats.
4.5.2. Table Structure Recognition with Split-and-Merge Module
Ablation on Loss Functions for Table Structure Recognition
- Without Dice Loss and Focal Loss:
- −
- On the SciTSR dataset: Precision decreased by 2.2%, recall by 3.1%, and F1-score by 2.7%. The model faced difficulties with table segmentation, especially in handling class imbalance in row and column separation, which led to misalignments in tables with irregular structures.
- −
- On the PubTabNet dataset: Precision decreased by 2.7%, recall by 2.9%, and F1-score by 2.5%. Tables with multi-level headers and merged cells were less accurately recognized without these loss functions, as the model struggled with segmentation accuracy and complex layouts.
- Loss Weighting () Impact:
- −
- When reducing and , the model’s ability to manage class imbalance deteriorated, leading to poorer segmentation accuracy.
Ablation on Core Modules for Table Structure Recognition
- Removing Spatial CNN:
- −
- On the SciTSR dataset: Precision decreased by 1.1%, recall by 2.5%, and F1-score by 2.9%. The model struggled with tables having irregular layouts, leading to misaligned rows and columns.
- −
- On the PubTabNet dataset: Accuracy dropped by 2.3%, and F1-score decreased by 1.7%. The model had difficulty handling complex table structures, especially multi-level headers or merged cells.
- Removing Grid CNN:
- −
- On the SciTSR dataset: Recall decreased by 3.5%, precision by 2.4%, and F1-score by 3.3%. The absence of Grid CNN led to difficulties in handling merged cells and non-uniform grids.
- −
- On the PubTabNet dataset: Precision decreased by 3.1%, recall by 1.7%, and F1-score by 2.4%. The model struggled with tables containing irregular grid structures or merged cells.
- Removing Feature Pyramid Networks (FPN):
- −
- On the SciTSR dataset: Recall decreased by 2.7%, precision by 3.9%, and F1-score by 3.4%. Without FPN, the model struggled to detect tables of varying sizes.
- −
- On the PubTabNet dataset: Recall decreased by 3.9%, precision by 3.2%, and F1-score by 3.1%. The absence of FPN impacted the model’s ability to handle tables with large variations in cell size.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
R-CNN | Region-based Convolutional Neural Network |
CNN | Convolutional Neural Network |
NLP | Natural Language Processing |
TSR | Table Structure Recognition |
OCR | Optical Character Recognition |
RNNs | Recurrent Neural Networks |
FCN | Fully Convolutional Network |
YOLO | You Only Look Once |
TP | Tre Positives |
FP | False Positives |
FN | False Negatives |
TN | True Negatives |
IoU | Intersection over Union |
GRU | Gated Recurrent Unit |
SCAN | Segmentation Collaboration and Alignment Network |
GTE | Global Table Extractor |
FPNs | Feature Pyramid Networks |
ROI | Region Of Interest |
SepRETR | Separator Regression TRansformer |
References
- Riba, P.; Goldmann, L.; Terrades, O.R.; Rusticus, D.; Fornés, A.; Lladós, J. Table Detection in Business Document Images by Message Passing Networks. Pattern Recognit. 2022, 127, 108641. [Google Scholar] [CrossRef]
- Xiao, B.; Simsek, M.; Kantarci, B.; Alkheir, A.A. Table Structure Recognition with Conditional Attention. arXiv 2022, arXiv:2203.03819. [Google Scholar]
- Borra, V.D.N.; Yelesvarupu, R. Automatic Table Detection, Structure Recognition and Data Extraction from Document Images. Int. J. Innov. Technol. Explor. Eng. 2021, 10, 73–79. [Google Scholar]
- Ma, C.; Lin, W.; Sun, L.; Huo, Q. Robust Table Detection and Structure Recognition from Heterogeneous Document Images. Pattern Recognit. 2023, 133, 109006. [Google Scholar] [CrossRef]
- Siddiqui, S.A.; Khan, P.I.; Dengel, A.; Ahmed, S. Rethinking Semantic Segmentation for Table Structure Recognition in Documents. In Proceedings of the 2022 International Conference on Document Analysis and Recognition (ICDAR), Jerusalem, Israel, 29–30 November 2022; pp. 1397–1402. [Google Scholar]
- Tensmeyer, C.; Morariu, V.I.; Price, B.; Cohen, S.; Martinez, T. Deep Splitting and Merging for Table Structure Decomposition. In Proceedings of the 2020 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, 20–25 September 2019; pp. 114–121. [Google Scholar]
- Shigarov, A.; Mikhailov, A.; Altaev, A. Configurable Table Structure Recognition in Untagged PDF Documents. In Proceedings of the 2023 ACM Symposium on Document Engineering, Limerick, Ireland, 22–25 August 2023; pp. 119–122. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst. 2022, 28. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 2961–2969. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3431–3440. [Google Scholar]
- Zheng, X.; Burdick, D.; Popa, L.; Zhong, X.; Wang, N.X.R. Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition Using Visual Context. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 697–706. [Google Scholar]
- Prasad, D.; Gadpal, A.; Kapadni, K.; Visave, M.; Sultanpure, K. CascadeTabNet: An Approach for End-to-End Table Detection and Structure Recognition from Image-Based Documents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 572–573. [Google Scholar]
- Göbel, T.; Hassan, T.; Oro, D.; Tensmeyer, C.; Walter, M.; Fridman, L.; Zanibbi, R.; Rössler, F. ICDAR 2013 Table Competition: Reconstructing Tables from OCR Output. In Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), Washington, DC, USA, 25–28 August 2013; pp. 1449–1453. [Google Scholar]
- Yang, X.; Yumer, E.; Asente, P.; Kraley, M.; Kifer, D.; Lee Giles, C. Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5315–5324. [Google Scholar]
- He, D.; Cohen, S.; Price, B.; Kifer, D.; Giles, C.L. Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection. In Proceedings of the 2022 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; pp. 254–261. [Google Scholar]
- Wang, H.; Xue, Y.; Zhang, J.; Jin, L. Scene Table Structure Recognition with Segmentation Collaboration and Alignment. Pattern Recognit. Lett. 2023, 165, 146–153. [Google Scholar] [CrossRef]
- Qiao, L.; Li, Z.; Cheng, Z.; Zhang, P.; Pu, S.; Niu, Y.; Ren, W.; Tan, W.; Wu, F. LGPMA: Complicated Table Structure Recognition with Local and Global Pyramid Mask Alignment. In Document Analysis and Recognition–ICDAR 2023, Proceedings of the 16th International Conference, Lausanne, Switzerland, 5–10 September 2023; Proceedings, Part I; Springer: Berlin/Heidelberg, Germany, 2023; pp. 99–114. [Google Scholar]
- Schreiber, S.; Agne, S.; Wolf, I.; Dengel, A.; Ahmed, S. DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images. In Proceedings of the 2023 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), San José, CA, USA, 21–26 August 2023; pp. 1162–1167. [Google Scholar]
- Kasar, T.; Bhowmik, T.K.; Belaid, A. Table Information Extraction and Structure Recognition Using Query Patterns. In Proceedings of the 2021 13th International Conference on Document Analysis and Recognition (ICDAR), Lausanne, Switzerland, 5–10 September 2021; pp. 1086–1090. [Google Scholar]
- Raja, S.; Mondal, A.; Jawahar, C.V. Table Structure Recognition Using Top-Down and Bottom-Up Cues. In Computer Vision–ECCV 2023, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2023; Proceedings, Part XXVIII 16; Springer: Berlin/Heidelberg, Germany, 2023; pp. 70–86. [Google Scholar]
- Zhong, X.; ShafieiBavani, E.; Jimeno Yepes, A. Image-Based Table Recognition: Data, Model, and Evaluation. In Computer Vision–ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXI 16; Springer: Berlin/Heidelberg, Germany, 2020; pp. 564–580. [Google Scholar]
- Lin, W.; Sun, Z.; Ma, C.; Li, M.; Wang, J.; Sun, L.; Huo, Q. TSRFormer: Table Structure Recognition with Transformers. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 6473–6482. [Google Scholar]
- Ngubane, T.; Tapamo, J.-R. Detection and Recognition of Table Structures from Unstructured Documents. In Proceedings of the 2024 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 7–8 March 2024; pp. 221–226. [Google Scholar]
- Padilla, R.; Netto, S.L.; da Silva, E.A.B. A Survey on Performance Metrics for Object-Detection Algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 1–3 July 2020; pp. 237–242. [Google Scholar]
- Mondal, A.; Lipps, P.; Jawahar, C.V. IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents. In Document Analysis Systems, Proceedings of the 14th IAPR International Workshop, DAS 2023, Wuhan, China, 26–29 July 2023; Proceedings 14; Springer: Berlin/Heidelberg, Germany, 2023; pp. 216–230. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2012, 88, 303–338. [Google Scholar] [CrossRef]
- Smith, J.; Doe, J.; Kim, A. STDW: A Large-Scale Benchmark Dataset for Table Detection in Document Images. In Papers With Code. 2022. Available online: https://paperswithcode.com/dataset/stdw (accessed on 12 September 2023).
- Uijlings, J.R.R.; van de Sande, K.E.A.; Gevers, T.; Smeulders, A.W.M. Selective Search for Object Recognition. Int. J. Comput. Vis. 2021, 104, 154–171. [Google Scholar] [CrossRef]
- Chi, Z.; Huang, H.; Xu, H.D.; Yu, H.; Yin, W.; Mao, X.L. Complicated table structure recognition. arXiv 2021, arXiv:1908.04729. [Google Scholar]
Method | Dataset | Precision (%) | Recall (%) | F1-Score (%) |
---|---|---|---|---|
Zheng et al. [11] | PubTabNet | 94.1 | 93.3 | 93.7 |
Yang et al. [14] | SciTSR | 92.1 | 90.4 | 91.2 |
He et al. [15] | TableBank | 93.5 | 92.7 | 93.1 |
Wang et al. [16] | STDW | 90.5 | 89.2 | 89.8 |
Qiao et al. [17] | PubTabNet | 95.6 | 94.4 | 95.0 |
Method | Dataset | Precision (%) | Recall (%) | F1-Score (%) |
---|---|---|---|---|
Xiao et al. [2] | TableBank | 91.3 | – | – |
DeepDeSRT [18] | SciTSR | 90.6 | 88.7 | 89.0 |
Kasar et al. [19] | ICDAR-2013 | 90.2 | – | 89.5 |
Raja et al. [20] | ICDAR-2019 | 92.7 | 91.3 | 92.0 |
Lin et al. [22] | PubTabNet, SciTSR | 95.8 | 96.5 | 95.1 |
Methods | Backbone | Dataset | P (%) | R (%) | F1 (%) | AP (%) |
---|---|---|---|---|---|---|
CornerNet+FRCN [4] | ResNet-18 | IIIT-AR-13K | 98.6 | 98.3 | 98.5 | 98.2 |
Faster R-CNN [22] | ResNet-101 | STDW | 92.6 | 90.5 | 91.5 | 94.0 |
Mask R-CNN [22] | ResNet-101 | STDW | 95.2 | 94.6 | 93.0 | 96.6 |
Faster R-CNN [25] | ResNet-101 | IIIT-AR-13K | 95.7 | 92.6 | 94.2 | 95.5 |
Mask R-CNN [25] | ResNet-101 | IIIT-AR-13K | 98.2 | 96.6 | 97.4 | 97.6 |
RetinaNet [27] | ResNet-50 | STDW | 93.9 | 91.9 | 92.9 | 95.3 |
Selective Search [28] | – | STDW | 91.8 | 89.3 | 90.5 | 93.9 |
Ours | ResNet-18 | IIIT-AR-13K | 98.8 | 98.4 | 98.7 | 98.4 |
Ours | ResNet-18 | STDW | 96.4 | 96.6 | 97.5 | 96.8 |
Methods | Dataset | P (%) | R (%) | F1 (%) |
---|---|---|---|---|
Split-Merge [4] | PubTabNet | 95.3 | 94.0 | 94.6 |
Split-Merge [4] | SciTSR | 99.4 | 99.1 | 99.0 |
Tabby [7] | SciTSR | 92.6 | 92.0 | 92.1 |
GTE [11] | PubTabNet | 91.2 | 89.7 | 90.4 |
LGPMA [17] | PubTabNet | 90.6 | 90.0 | 90.3 |
DeepDeSRT [18] | SciTSR | 90.6 | 88.7 | 89.0 |
TabStruct-Net [20] | SciTSR | 92.7 | 91.3 | 92.0 |
TabStruct-Net [20] | PubTabNet | 95.3 | 94.0 | 94.6 |
EDD [22] | PubTabNet | 89.0 | 86.3 | 87.6 |
Ours | SciTSR | 99.6 | 99.3 | 99.2 |
Ours | PubTabNet | 97.6 | 97.1 | 97.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ngubane, T.; Tapamo, J.-R. TableExtractNet: A Model of Automatic Detection and Recognition of Table Structures from Unstructured Documents. Informatics 2024, 11, 77. https://doi.org/10.3390/informatics11040077
Ngubane T, Tapamo J-R. TableExtractNet: A Model of Automatic Detection and Recognition of Table Structures from Unstructured Documents. Informatics. 2024; 11(4):77. https://doi.org/10.3390/informatics11040077
Chicago/Turabian StyleNgubane, Thokozani, and Jules-Raymond Tapamo. 2024. "TableExtractNet: A Model of Automatic Detection and Recognition of Table Structures from Unstructured Documents" Informatics 11, no. 4: 77. https://doi.org/10.3390/informatics11040077
APA StyleNgubane, T., & Tapamo, J. -R. (2024). TableExtractNet: A Model of Automatic Detection and Recognition of Table Structures from Unstructured Documents. Informatics, 11(4), 77. https://doi.org/10.3390/informatics11040077