PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification
Abstract
:1. Introduction
- We introduce PaperNet-Dataset, which contains multi-modal data (text and figure) for fine-grained paper classification. To the best of our knowledge, this is the first multi-modal fine-grained paper dataset. In addition, it was pre-processed for convenience of use.
- Extensive experiments using current mainstream models were conducted to evaluate PaperNet. None of them reached an accuracy of 80%. This shows that fine-grained paper classification is a challenging task and PaperNet could be used as a worthy benchmark.
- Additionally, we propose a multi-modal paper classification method as a potential direction for better performance. The proposed method combines the strengths of MobileNetV3 and Albert for multi-modal representation fusion and shows promising results.
2. Background and Related Works
2.1. Related Datasets
2.2. Multi-Modal Learning
2.3. Paper Document Classification
3. Method
3.1. Figure Feature Representation
3.2. Text Feature Representation
3.3. Multi-Modal Feature Fusion
3.4. Classification
4. Dataset
4.1. Papernet-Dataset
- PaperNet_2. PaperNet_2 dataset is a coarse-grained paper classification dataset and contains 2 classes, CV and NLP.
- PaperNet_20. PaperNet_20 dataset is for fine-grained paper classification. It contains 20 classes, 7 in CV and 13 in NLP.
- PaperNet_CV. PaperNet_CV dataset is a subset of the PaperNet_20 dataset. The dataset includes 7 classes.
- PaperNet_NLP. PaperNet_NLP dataset is another subset of the PaperNet_20 dataset which includes 13 classes.
4.2. Data Pre-Processing and Feature Engineering
5. Experiment
5.1. Algorithms
- ResNet50: Residual Network [26] is widely used in the image classification field and as part of the backbone of neural networks in computer vision tasks.
- DenseNet121: The DenseNet model [27] alleviates the problem of gradient disappearance, strengthens feature propagation, and reduces the number of parameters.
- MobileNetV3: MobileNetV3 [28] is a light-weight model. Combined with network design and NAS technology, a new generation of MobileNets is proposed.
- ULMFiT: This is a general language model based on fine tuning (ULMFiT), which can be applied to a variety of tasks in NLP [29].
- Albert: Albert model from paper [30]. Compared to BERT, Albert achieves better results with fewer parameters. We use the Albert model as part of our proposed model for text encoding.
- Concat: Previous work [25] concatenates different feature vectors of different modalities as the input of the classification layer. We implement this concatenation model with our feature vectors of different modalities and apply it for classification.
5.2. Settings
5.3. Main Results
5.3.1. Text Classification
5.3.2. Image Classification
5.3.3. Multi-Modal Classification
5.4. Other Machine Learning Algorithms
6. Conclusions and Future Work
Author Contributions
Funding
Conflicts of Interest
Abbreviations
CV | Computer Vision |
NLP | Natural Language Processing |
KNN | K-Nearest Neighbor |
SVM | Support Vector Machine |
Appendix A
Dataset | Class | Precision | Recall | F1-Score |
---|---|---|---|---|
PaperNet_2 | CV | 0.99 | 0.94 | 0.97 |
NLP | 0.96 | 0.99 | 0.98 | |
PaperNet_20 | CV_attention | 0.60 | 0.15 | 0.55 |
CV_classification | 0.52 | 0.47 | 0.50 | |
CV_detection | 0.42 | 0.93 | 0.58 | |
CV_GAN | 0.68 | 0.47 | 0.55 | |
CV_recognition | 0.82 | 0.38 | 0.52 | |
CV_retrieval | 0.99 | 0.05 | 0.10 | |
CV_segmentation | 0.61 | 0.21 | 0.31 | |
NLP_bert | 0.46 | 0.66 | 0.54 | |
NLP_conversation | 0.67 | 0.81 | 0.73 | |
NLP_cross | 0.43 | 0.45 | 0.44 | |
NLP_extraction | 0.49 | 0.62 | 0.55 | |
NLP_Few_shot | 0.83 | 0.21 | 0.33 | |
NLP_knowledge_graph | 0.99 | 0.40 | 0.57 | |
NLP_machine_reading | 0.99 | 0.05 | 0.10 | |
NLP_machine_translation | 0.67 | 0.88 | 0.76 | |
NLP_multilingual | 0.65 | 0.52 | 0.58 | |
NLP_multimodal | 0.67 | 0.53 | 0.59 | |
NLP_named_entity_recognition | 0.99 | 0.20 | 0.33 | |
NLP_sentiment_analysis | 0.70 | 0.40 | 0.51 | |
NLP_text_generation | 0.89 | 0.40 | 0.55 | |
PaperNet_CV | CV_attention | 0.67 | 0.15 | 0.24 |
CV_classification | 0.52 | 0.41 | 0.46 | |
CV_detection | 0.40 | 0.95 | 0.56 | |
CV_GAN | 0.41 | 0.20 | 0.27 | |
CV_recognition | 0.70 | 0.23 | 0.35 | |
CV_retrieval | 0.41 | 0.21 | 0.27 | |
CV_segmentation | 0.50 | 0.15 | 0.24 | |
PaperNet_NLP | NLP_bert | 0.42 | 0.65 | 0.51 |
NLP_conversation | 0.64 | 0.79 | 0.71 | |
NLP_cross | 0.44 | 0.41 | 0.43 | |
NLP_extraction | 0.46 | 0.62 | 0.52 | |
NLP_Few_shot | 0.75 | 0.12 | 0.21 | |
NLP_knowledge_graph | 0.99 | 0.25 | 0.40 | |
NLP_machine_reading | 0.51 | 0.12 | 0.20 | |
NLP_machine_translation | 0.57 | 0.86 | 0.69 | |
NLP_multilingual | 0.58 | 0.42 | 0.49 | |
NLP_multimodal | 0.71 | 0.50 | 0.59 | |
NLP_named_entity_recognition | 0.80 | 0.20 | 0.32 | |
NLP_sentiment_analysis | 0.69 | 0.31 | 0.43 | |
NLP_text_generation | 0.50 | 0.13 | 0.20 |
Dataset | Class | Precision | Recall | F1-Score |
---|---|---|---|---|
PaperNet_2 | CV | 0.96 | 0.96 | 0.96 |
NLP | 0.97 | 0.97 | 0.97 | |
PaperNet_20 | CV_attention | 0.83 | 0.85 | 0.84 |
CV_classification | 0.93 | 0.84 | 0.88 | |
CV_detection | 0.93 | 0.97 | 0.95 | |
CV_GAN | 0.73 | 0.60 | 0.66 | |
CV_recognition | 0.95 | 0.90 | 0.92 | |
CV_retrieval | 0.95 | 0.90 | 0.92 | |
CV_segmentation | 0.92 | 0.87 | 0.89 | |
NLP_bert | 0.82 | 0.80 | 0.81 | |
NLP_conversation | 0.81 | 0.85 | 0.83 | |
NLP_cross | 0.76 | 0.79 | 0.77 | |
NLP_extraction | 0.78 | 0.82 | 0.80 | |
NLP_Few_shot | 0.59 | 0.67 | 0.63 | |
NLP_knowledge_graph | 0.54 | 0.70 | 0.61 | |
NLP_machine_reading | 0.91 | 0.50 | 0.65 | |
NLP_machine_translation | 0.86 | 0.86 | 0.86 | |
NLP_multilingual | 0.80 | 0.82 | 0.81 | |
NLP_multimodal | 0.75 | 0.70 | 0.72 | |
NLP_named_entity_recognition | 0.74 | 0.85 | 0.79 | |
NLP_sentiment_analysis | 0.59 | 0.74 | 0.66 | |
NLP_text_generation | 0.68 | 0.65 | 0.67 | |
PaperNet_CV | CV_attention | 0.52 | 0.99 | 0.68 |
CV_classification | 0.99 | 0.81 | 0.90 | |
CV_detection | 0.92 | 0.98 | 0.95 | |
CV_GAN | 0.50 | 0.49 | 0.50 | |
CV_recognition | 0.99 | 0.05 | 0.10 | |
CV_retrieval | 0.51 | 0.50 | 0.50 | |
CV_segmentation | 0.88 | 0.99 | 0.94 | |
PaperNet_NLP | NLP_bert | 0.87 | 0.85 | 0.86 |
NLP_conversation | 0.96 | 0.83 | 0.89 | |
NLP_cross | 0.87 | 0.93 | 0.90 | |
NLP_extraction | 0.83 | 0.85 | 0.84 | |
NLP_Few_shot | 0.69 | 0.75 | 0.72 | |
NLP_knowledge_graph | 0.68 | 0.75 | 0.71 | |
NLP_machine_reading | 0.56 | 0.45 | 0.50 | |
NLP_machine_translation | 0.92 | 0.90 | 0.91 | |
NLP_multilingual | 0.84 | 0.84 | 0.84 | |
NLP_multimodal | 0.93 | 0.93 | 0.93 | |
NLP_named_entity_recognition | 0.75 | 0.90 | 0.82 | |
NLP_sentiment_analysis | 0.76 | 0.89 | 0.82 | |
NLP_text_generation | 0.93 | 0.70 | 0.80 |
Dataset | Class | Precision | Recall | F1-Score |
---|---|---|---|---|
PaperNet_2 | CV | 0.98 | 0.94 | 0.96 |
NLP | 0.96 | 0.98 | 0.97 | |
PaperNet_20 | CV_attention | 0.24 | 0.23 | 0.23 |
CV_classification | 0.44 | 0.44 | 0.44 | |
CV_detection | 0.57 | 0.75 | 0.65 | |
CV_GAN | 0.44 | 0.60 | 0.50 | |
CV_recognition | 0.62 | 0.55 | 0.58 | |
CV_retrieval | 0.38 | 0.15 | 0.21 | |
CV_segmentation | 0.54 | 0.40 | 0.46 | |
NLP_bert | 0.62 | 0.64 | 0.63 | |
NLP_conversation | 0.71 | 0.79 | 0.75 | |
NLP_cross | 0.49 | 0.52 | 0.50 | |
NLP_extraction | 0.68 | 0.60 | 0.64 | |
NLP_Few_shot | 0.48 | 0.46 | 0.47 | |
NLP_knowledge_graph | 0.88 | 0.75 | 0.81 | |
NLP_machine_reading | 0.99 | 0.50 | 0.67 | |
NLP_machine_translation | 0.73 | 0.85 | 0.79 | |
NLP_multilingual | 0.66 | 0.54 | 0.59 | |
NLP_multimodal | 0.77 | 0.67 | 0.71 | |
NLP_named_entity_recognition | 0.80 | 0.60 | 0.69 | |
NLP_sentiment_analysis | 0.57 | 0.60 | 0.58 | |
NLP_text_generation | 0.92 | 0.55 | 0.69 | |
PaperNet_CV | CV_attention | 0.45 | 0.38 | 0.41 |
CV_classification | 0.53 | 0.47 | 0.50 | |
CV_detection | 0.56 | 0.75 | 0.64 | |
CV_GAN | 0.48 | 0.58 | 0.53 | |
CV_recognition | 0.64 | 0.48 | 0.55 | |
CV_retrieval | 0.70 | 0.35 | 0.47 | |
CV_segmentation | 0.55 | 0.44 | 0.49 | |
PaperNet_NLP | NLP_bert | 0.60 | 0.62 | 0.61 |
NLP_conversation | 0.67 | 0.73 | 0.70 | |
NLP_cross | 0.53 | 0.55 | 0.54 | |
NLP_extraction | 0.71 | 0.66 | 0.69 | |
NLP_Few_shot | 0.45 | 0.58 | 0.51 | |
NLP_knowledge_graph | 0.89 | 0.80 | 0.84 | |
NLP_machine_reading | 0.99 | 0.50 | 0.67 | |
NLP_machine_translation | 0.72 | 0.83 | 0.77 | |
NLP_multilingual | 0.57 | 0.50 | 0.53 | |
NLP_multimodal | 0.82 | 0.90 | 0.86 | |
NLP_named_entity_recognition | 0.80 | 0.60 | 0.69 | |
NLP_sentiment_analysis | 0.64 | 0.66 | 0.65 | |
NLP_text_generation | 0.90 | 0.45 | 0.60 |
Dataset | Class | Precision | Recall | F1-Score |
---|---|---|---|---|
PaperNet_2 | CV | 0.97 | 0.99 | 0.98 |
NLP | 0.98 | 0.99 | 0.98 | |
PaperNet_20 | CV_attention | 0.63 | 0.30 | 0.41 |
CV_classification | 0.63 | 0.80 | 0.70 | |
CV_detection | 0.68 | 0.96 | 0.79 | |
CV_GAN | 0.82 | 0.62 | 0.71 | |
CV_recognition | 0.86 | 0.73 | 0.79 | |
CV_retrieval | 0.99 | 0.05 | 0.10 | |
CV_segmentation | 0.87 | 0.87 | 0.87 | |
NLP_bert | 0.62 | 0.93 | 0.74 | |
NLP_conversation | 0.93 | 0.79 | 0.85 | |
NLP_cross | 0.75 | 0.77 | 0.76 | |
NLP_extraction | 0.75 | 0.81 | 0.78 | |
NLP_Few_shot | 0.88 | 0.29 | 0.44 | |
NLP_knowledge_graph | 0.99 | 0.50 | 0.67 | |
NLP_machine_reading | 0.99 | 0.30 | 0.46 | |
NLP_machine_translation | 0.83 | 0.95 | 0.88 | |
NLP_multilingual | 0.91 | 0.78 | 0.84 | |
NLP_multimodal | 0.88 | 0.77 | 0.82 | |
NLP_named_entity_recognition | 0.94 | 0.75 | 0.83 | |
NLP_sentiment_analysis | 0.77 | 0.69 | 0.73 | |
NLP_text_generation | 0.99 | 0.70 | 0.82 | |
PaperNet_CV | CV_attention | 0.75 | 0.30 | 0.43 |
CV_classification | 0.68 | 0.79 | 0.73 | |
CV_detection | 0.62 | 0.96 | 0.75 | |
CV_GAN | 0.87 | 0.58 | 0.69 | |
CV_recognition | 0.84 | 0.68 | 0.75 | |
CV_retrieval | 0.99 | 0.05 | 0.10 | |
CV_segmentation | 0.89 | 0.75 | 0.81 | |
PaperNet_NLP | NLP_bert | 0.57 | 0.94 | 0.71 |
NLP_conversation | 0.93 | 0.79 | 0.85 | |
NLP_cross | 0.76 | 0.73 | 0.75 | |
NLP_extraction | 0.70 | 0.82 | 0.76 | |
NLP_Few_shot | 0.90 | 0.38 | 0.53 | |
NLP_knowledge_graph | 0.99 | 0.50 | 0.67 | |
NLP_machine_reading | 0.99 | 0.10 | 0.18 | |
NLP_machine_translation | 0.84 | 0.97 | 0.90 | |
NLP_multilingual | 0.93 | 0.76 | 0.84 | |
NLP_multimodal | 0.89 | 0.80 | 0.84 | |
NLP_named_entity_recognition | 0.94 | 0.85 | 0.89 | |
NLP_sentiment_analysis | 0.86 | 0.71 | 0.78 | |
NLP_text_generation | 0.99 | 0.60 | 0.75 |
References
- Zyuzin, V.; Ronkin, M.; Porshnev, S.; Kalmykov, A. Automatic Asbestos Control Using Deep Learning Based Computer Vision System. Appl. Sci. 2021, 11, 532. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar] [CrossRef] [Green Version]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
- Dhaliwal, S.S.; Nahid, A.A.; Abbas, R. Effective Intrusion Detection System Using XGBoost. Information 2018, 9, 149. [Google Scholar] [CrossRef] [Green Version]
- Mukhamediev, R.I.; Symagulov, A.; Kuchin, Y.; Yakunin, K.; Yelis, M. From Classical Machine Learning to Deep Neural Networks: A Simplified Scientometric Review. Appl. Sci. 2021, 11, 5541. [Google Scholar] [CrossRef]
- Ma, X.; Wang, R. Personalized Scientific Paper Recommendation Based on Heterogeneous Graph Representation. IEEE Access 2019, 7, 79887–79894. [Google Scholar] [CrossRef]
- Adhikari, A.; Ram, A.; Tang, R.; Lin, J. DocBERT: BERT for Document Classification. arXiv 2019, arXiv:cs.CL/1904.08398. [Google Scholar]
- Quan, J.; Li, Q.; Li, M. Computer Science Paper Classification for CSAR. In New Horizons in Web Based Learning; Cao, Y., Väljataga, T., Tang, J.K., Leung, H., Laanpere, M., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 34–43. [Google Scholar]
- Apté, C.; Damerau, F.; Weiss, S.M. Automated learning of decision rules for text categorization. ACM Trans. Inf. Syst. (TOIS) 1994, 12, 233–251. [Google Scholar] [CrossRef]
- Yang, P.; Sun, X.; Li, W.; Ma, S.; Wu, W.; Wang, H. SGM: Sequence generation model for multi-label classification. arXiv 2018, arXiv:1806.04822. [Google Scholar]
- Jobin, K.; Mondal, A.; Jawahar, C. DocFigure: A dataset for scientific document figure classification. In Proceedings of the 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), Sydney, NSW, Australia, 22–25 September 2019; Volume 1, pp. 74–79. [Google Scholar]
- Cadene, R.; Ben-younes, H.; Cord, M.; Thome, N. MUREL: Multimodal Relational Reasoning for Visual Question Answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Zhu, J.; Zhou, Y.; Zhang, J.; Li, H.; Zong, C.; Li, C. Multimodal Summarization with Guidance of Multimodal Reference. Proc. AAAI Conf. Artif. Intell. 2020, 34, 9749–9756. [Google Scholar] [CrossRef]
- Qian, S.; Zhang, T.; Xu, C.; Shao, J. Multi-Modal Event Topic Model for Social Event Analysis. IEEE Trans. Multimed. 2016, 18, 233–246. [Google Scholar] [CrossRef]
- Xia, Y.; Zhang, L.; Liu, Z.; Nie, L.; Li, X. Weakly Supervised Multimodal Kernel for Categorizing Aerial Photographs. IEEE Trans. Image Process. 2017, 26, 3748–3758. [Google Scholar] [CrossRef]
- Zadeh, A.; Chen, M.; Poria, S.; Cambria, E.; Morency, L.P. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the Empirical Methods in Natural Language Processing, EMNLP, Copenhagen, Denmark, 7–11 September 2017. [Google Scholar]
- Liu, J.; Chang, W.C.; Wu, Y.; Yang, Y. Deep learning for extreme multi-label text classification. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, 7–11 August 2017; pp. 115–124. [Google Scholar]
- Kim, Y. Convolutional Neural Networks for Sentence Classification. arXiv 2014, arXiv:cs.CL/1408.5882. [Google Scholar]
- Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar]
- Nguyen, D.B.; Shenify, M.; Al-Mubaid, H. Biomedical Text Classification with Improved Feature Weighting Method. In Proceedings of the International Conference on Bioinformatics and Computational Biology, Las Vegas, NV, USA, 4–6 April 2016. [Google Scholar]
- Adhikari, A.; Ram, A.; Tang, R.; Lin, J. Rethinking complex neural network architectures for document classification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4046–4051. [Google Scholar]
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep contextualized word representations. arXiv 2018, arXiv:1802.05365. [Google Scholar]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. arXiv 2020, arXiv:cs.CL/2005.14165. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Schifanella, R.; de Juan, P.; Tetreault, J.; Cao, L. Detecting Sarcasm in Multimodal Social Platforms. In Proceedings of the 24th ACM International Conference on Multimedia (MM ’16); Association for Computing Machinery: New York, NY, USA, 2016; pp. 1136–1145. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Chengdu, China, 15–17 December 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 1314–1324. [Google Scholar]
- Howard, J.; Ruder, S. Universal language model fine-tuning for text classification. arXiv 2018, arXiv:1801.06146. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A lite bert for self-supervised learning of language representations. arXiv 2020, arXiv:1909.11942. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:cs.LG/1412.6980. [Google Scholar]
Modality | Dataset | Detail | ||
---|---|---|---|---|
Class | Sample | Word | ||
Reuters | 90 | 10,789 | 144.3 | |
Text | AAPD | 54 | 55,840 | 167.3 |
IMDB | 10 | 135,669 | 393.8 | |
Cifar-10 | 10 | 60,000 | - | |
Figure | DocFigure | 28 | 33,000 | - |
Deepchart | 5 | 5000 | - | |
CUB | 200 | 11788 | - | |
Multi-modal | Food-101 | 101 | 90,704 | - |
PaperNet v1.0 | 20 | 38,608 | 150.43 |
Dataset | Subset | Class | Text | Figure | Multi. |
---|---|---|---|---|---|
PaperNet_2 | CV | 1863 | 12,774 | 25,548 | |
NLP | 2538 | 6609 | 13,060 | ||
Average | Coarse-grained class | 2200.5 | 9691.5 | 19,304 | |
PaperNet_20 | PaperNet_CV | CV_attention | 201 | 1157 | 2314 |
CV_classification | 387 | 1951 | 3902 | ||
CV_detection | 621 | 4280 | 8560 | ||
CV_GAN | 228 | 1932 | 3864 | ||
CV_recognition | 284 | 1432 | 2864 | ||
CV_retrieval | 82 | 270 | 540 | ||
CV_segmentation | 260 | 1752 | 3504 | ||
PaperNet_NLP | NLP_bert | 372 | 1215 | 2430 | |
NLP_conversation | 262 | 698 | 1396 | ||
NLP_cross | 277 | 576 | 1152 | ||
NLP_extraction | 340 | 681 | 1362 | ||
NLP_Few_shot | 119 | 309 | 614 | ||
NLP_knowledge_graph | 82 | 244 | 488 | ||
NLP_machine_reading | 59 | 104 | 208 | ||
NLP_machine_translation | 490 | 832 | 1664 | ||
NLP_multilingual | 253 | 557 | 1114 | ||
NLP_multimodal | 145 | 829 | 1656 | ||
NLP_named_entity_recognition | 105 | 189 | 378 | ||
NLP_sentiment_analysis | 172 | 179 | 358 | ||
NLP_text_generation | 100 | 121 | 240 | ||
Average | Fine-grained class | 241.95 | 965.4 | 1930.4 |
Modality | Algorithm | PaperNet_2 | PaperNet_20 | PaperNet_CV | PaperNet_NLP |
---|---|---|---|---|---|
Image | ResNet50 | 83.94 ± 0.55 | 50.30 ± 0.66 | 60.23 ± 0.61 | 45.33 ± 0.50 |
DenseNet121 | 82.34 ± 0.96 | 45.16 ± 0.46 | 58.80 ± 0.40 | 46.21 ± 0.46 | |
MobileNetV3 | 83.66 ± 0.86 | 48.38 ± 0.56 | 56.45 ± 0.40 | 40.08 ± 0.54 | |
Text | Ulmfit | 96.26 ± 0.44 | 71.30 ± 1.12 | 75.30 ± 1.06 | 73.33 ± 0.18 |
Albert | 96.31 ± 0.09 | 73.32 ± 0.24 | 73.18 ± 0.12 | 74.23 ± 0.36 | |
Multi-modal | Concat | 96.27 ± 0.11 | 73.45 ± 0.69 | 72.43 ± 0.27 | 75.36 ± 0.51 |
Our method | 97.05 ± 0.05 | 73.85 ± 0.39 | 74.26 ± 0.32 | 79.27 ± 0.37 |
Dataset | Metrics | Naive Bayes | Adaboost | KNN | SVM | Random Forest |
---|---|---|---|---|---|---|
PaperNet_2 | Precision | 97.91 ± 0.65 | 96.51 ± 0.56 | 96.87 ± 0.63 | 99.12 ± 0.62 | 97.23 ± 0.12 |
Recall | 97.20 ± 0.46 | 96.67 ± 0.32 | 96.42 ± 0.26 | 98.94 ± 0.33 | 97.58 ± 0.26 | |
F1-score | 97.56 ± 0.23 | 96.66 ± 0.25 | 96.64 ± 0.21 | 98.96 ± 0.16 | 97.42 ± 0.22 | |
Accuracy | 96.44 ± 0.36 | 94.08 ± 0.23 | 93.56 ± 0.43 | 96.98 ± 0.26 | 96.32 ± 0.16 | |
PaperNet_20 | Precision | 70.75 ± 1.31 | 79.15 ± 0.39 | 62.64 ± 1.31 | 83.79 ± 1.21 | 81.32 ± 0.17 |
Recall | 43.98 ± 0.63 | 78.36 ± 0.32 | 55.89 ± 0.75 | 66.72 ± 0.69 | 72.15 ± 0.08 | |
F1-score | 46.92 ± 0.39 | 78.46 ± 0.16 | 58.01 ± 0.87 | 69.97 ± 0.52 | 76.46 ± 0.09 | |
Accuracy | 50.22 ± 0.59 | 71.76 ± 0.26 | 53.01 ± 0.68 | 69.43 ± 0.31 | 72.89 ± 0.21 | |
PaperNet_CV | Precision | 50.89 ± 1.29 | 76.81 ± 0.89 | 55.85 ± 0.85 | 80.67 ± 1.12 | 82.12 ± 0.06 |
Recall | 32.85 ± 0.64 | 70.02 ± 0.49 | 49.33 ± 0.64 | 58.67 ± 0.62 | 75.43 ± 0.15 | |
F1-score | 34.11 ± 0.67 | 65.33 ± 0.28 | 51.22 ± 0.47 | 60.98 ± 0.21 | 78.64 ± 0.08 | |
Accuracy | 44.96 ± 0.91 | 72.40 ± 0.63 | 50.67 ± 0.56 | 64.34 ± 0.49 | 73.86 ± 0.36 | |
PaperNet_NLP | Precision | 61.58 ± 1.16 | 80.68 ± 0.67 | 70.65 ± 0.83 | 86.56 ± 1.24 | 82.89 ± 0.18 |
Recall | 41.43 ± 0.87 | 81.29 ± 0.56 | 64.55 ± 0.65 | 68.85 ± 0.72 | 80.25 ± 0.16 | |
F1-score | 43.81 ± 0.64 | 81.04 ± 0.51 | 66.51 ± 0.39 | 72.64 ± 0.51 | 81.55 ± 0.11 | |
Accuracy | 48.16 ± 0.89 | 79.01 ± 0.32 | 56.97 ± 0.41 | 68.92 ± 0.55 | 78.89 ± 0.26 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yue, T.; Li, Y.; Shi, X.; Qin, J.; Fan, Z.; Hu, Z. PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification. Appl. Sci. 2022, 12, 4554. https://doi.org/10.3390/app12094554
Yue T, Li Y, Shi X, Qin J, Fan Z, Hu Z. PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification. Applied Sciences. 2022; 12(9):4554. https://doi.org/10.3390/app12094554
Chicago/Turabian StyleYue, Tan, Yong Li, Xuzhao Shi, Jiedong Qin, Zijiao Fan, and Zonghai Hu. 2022. "PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification" Applied Sciences 12, no. 9: 4554. https://doi.org/10.3390/app12094554
APA StyleYue, T., Li, Y., Shi, X., Qin, J., Fan, Z., & Hu, Z. (2022). PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification. Applied Sciences, 12(9), 4554. https://doi.org/10.3390/app12094554