GRAAL: Graph-Based Retrieval for Collecting Related Passages across Multiple Documents †
Abstract
:1. Introduction
- We propose a novel fine-grained graph-based method for retrieving passages relevant to a claim or question. Our method considers the associations among mentioned entities within a sentence at a finer level compared to previous approaches by examining their semantic relationships.
- We perform extensive experimental analysis and demonstrate that our approach can retrieve a significantly more compact set of passages while still maintaining good performance in accurately targeting the relevant text.
- We perform a qualitative analysis of the results and discuss some real examples, providing useful hints on the impact of our fine-grained graph structure on performance.
2. Related Work
3. Background
4. Method
4.1. Corpus Graph Construction
Algorithm 1: Corpus graph construction |
4.2. Graph-Based Passage Retrieval
Algorithm 2: Search candidate evidence employing the corpus graph |
Result: Return a small set of sentences S that entails claim c Compute entity linking on c and put all mentions in return S |
5. Experimental Analysis
- Insufficient text provided by the available entities. This is the most common case of failure. For instance, the claim “Stanley Tucci performed in a television series” cannot be solved because the text contains only one linkable entity, “Stanley Tucci”, and the associated text in the reference corpus is not sufficient to validate the claim. The complete evidence requires a passage from the page “Monk (TV series)” containing the information that Monk (which Stanley Tucci appeared in) is a television series. Note that this issue might be resolved by a more general entity linker capable of linking broad concepts such as “television series”.
- Missing or incorrect entity linking from the input. Failures of the entity linking module in detecting entities from the input (claim or question) prevent identifying the complete subgraph of the corpus graph, leading to missing important passages. For instance, in the claim “Mickey Rooney was in a film based on the novel The Black Stallion by Walter Farley” the mention “The Black Stallion” is incorrectly linked to “The Black Leather Jacket”. Therefore, the retrieved subgraph cannot contain complete evidence.
- Missing or incorrect connections in the corpus graph. This can be due to various reasons, such as the failure of the entity linker to correctly detect an entity from the reference corpus or the inability to detect relations across sentences. For example, the claim “Grace Kelly did not work with Alfred Hitchcock” cannot be contradicted since the information that Grace Kelly worked in Rear Window (directed by Alfred Hitchcock) is contained in the reference corpus under “Other notable works include… Rear Window…”. In this context, it is clear that the sentence refers to Grace Kelly, but this connection is missed during the generation of the corpus graph.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Samarinas, C.; Hsu, W.; Lee, M.L. Latent Retrieval for Large-Scale Fact-Checking and Question Answering with NLI training. In Proceedings of the 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore, MD, USA, 9–11 November 2020; pp. 941–948. [Google Scholar] [CrossRef]
- Thorne, J.; Vlachos, A.; Christodoulopoulos, C.; Mittal, A. FEVER: A Large-scale Dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (Long Papers), New Orleans, LA, USA, 1–6 June 2018; Volume 1, pp. 809–819. [Google Scholar] [CrossRef]
- Karpukhin, V.; Oğuz, B.; Min, S.; Lewis, P.; Wu, L.; Edunov, S.; Chen, D.; Yih, W.t. Dense passage retrieval for open-domain question answering. arXiv 2020, arXiv:2004.04906. [Google Scholar]
- Petroni, F.; Rocktäschel, T.; Riedel, S.; Lewis, P.; Bakhtin, A.; Wu, Y.; Miller, A. Language Models as Knowledge Bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 2463–2473. [Google Scholar]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Thorne, J.; Vlachos, A. Automated Fact Checking: Task Formulations, Methods and Future Directions. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–26 August 2018; pp. 3346–3359. [Google Scholar]
- Vosoughi, S.; Roy, D.; Aral, S. The spread of true and false news online. Science 2018, 359, 1146–1151. [Google Scholar] [CrossRef] [PubMed]
- Zhou, J.; Han, X.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 892–901. [Google Scholar]
- Zhong, W.; Xu, J.; Tang, D.; Xu, Z.; Duan, N.; Zhou, M.; Wang, J.; Yin, J. Reasoning Over Semantic-Level Graph for Fact Checking. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 6170–6180. [Google Scholar]
- Yoneda, T.; Mitchell, J.; Welbl, J.; Stenetorp, P.; Riedel, S. UCL Machine Reading Group: Four Factor Framework For Fact Finding (HexaF). In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), Brussels, Belgium, November 2018; pp. 97–102. [Google Scholar] [CrossRef]
- Soleimani, A.; Monz, C.; Worring, M. BERT for Evidence Retrieval and Claim Verification. Adv. Inf. Retr. 2020, 12036, 359–366. [Google Scholar] [CrossRef] [PubMed]
- Nie, Y.; Chen, H.; Bansal, M. Combining Fact Extraction and Verification with Neural Semantic Matching Networks. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 6859–6866, Number: 01. [Google Scholar] [CrossRef]
- Ma, J.; Gao, W.; Joty, S.; Wong, K.F. Sentence-level evidence embedding for claim verification with hierarchical attention networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), Florence, Italy, 28 July– 2 August 2019; pp. 2561–2571. [Google Scholar] [CrossRef]
- Liu, Z.; Xiong, C.; Sun, M.; Liu, Z. Fine-grained Fact Verification with Kernel Graph Attention Network. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7342–7351. [Google Scholar]
- Hanselowski, A.; Zhang, H.; Li, Z.; Sorokin, D.; Schiller, B.; Schulz, C.; Gurevych, I. UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), Brussels, Belgium, November 2018; pp. 103–108. [Google Scholar] [CrossRef]
- Guu, K.; Lee, K.; Tung, Z.; Pasupat, P.; Chang, M.W. REALM: Retrieval-Augmented Language Model Pre-Training. arXiv 2020, arXiv:2002.08909. [Google Scholar]
- Zobel, J.; Moffat, A. Inverted files for text search engines. ACM Comput. Surv. 2006, 38, 6–es. [Google Scholar] [CrossRef]
- Akritidis, L.; Katsaros, D.; Bozanis, P. Improved retrieval effectiveness by efficient combination of term proximity and zone scoring: A simulation-based evaluation. Simul. Model. Pract. Theory 2012, 22, 74–91. [Google Scholar] [CrossRef]
- Cambazoglu, B.B.; Kayaaslan, E.; Jonassen, S.; Aykanat, C. A term-based inverted index partitioning model for efficient distributed query processing. ACM Trans. Web 2013, 7, 1–23. [Google Scholar] [CrossRef]
- Mongiovì, M.; Gangemi, A. Graph-based Retrieval for Claim Verification over Cross-document Evidence. In Proceedings of the International Conference on Complex Networks and Their Applications, Madrid, Spain, 30 November–2 December 2021; pp. 486–495. [Google Scholar]
- Chang, W.C.; Yu, F.X.; Chang, Y.W.; Yang, Y.; Kumar, S. Pre-training Tasks for Embedding-based Large-scale Retrieval. arXiv 2020, arXiv:2002.03932. [Google Scholar]
- Lee, K.; Chang, M.W.; Toutanova, K. Latent Retrieval for Weakly Supervised Open Domain Question Answering. arXiv 2019, arXiv:1906.00300. [Google Scholar]
- Guo, Z.; Schlichtkrull, M.; Vlachos, A. A Survey on Automated Fact-Checking. arXiv 2021, arXiv:2108.11896. [Google Scholar] [CrossRef]
- Chen, Q.; Zhu, X.; Ling, Z.; Wei, S.; Jiang, H.; Inkpen, D. Enhanced LSTM for Natural Language Inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 1657–1668. [Google Scholar] [CrossRef]
- Papadopoulos, D.; Metropoulou, K.; Papadakis, N.; Matsatsinis, N. FarFetched: Entity-centric Reasoning and Claim Validation for the Greek Language based on Textually Represented Environments. In Proceedings of the 12th Hellenic Conference on Artificial Intelligence, Corfu, Greece, 7–9 September 2022; pp. 1–10. [Google Scholar]
- Kallipolitis, A.; Gallos, P.; Menychtas, A.; Tsanakas, P.; Maglogiannis, I. Medical Knowledge Extraction from Graph-Based Modeling of Electronic Health Records. In Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, León, Spain, 14–17 June 2023; pp. 279–290. [Google Scholar]
- Giarelis, N.; Kanakaris, N.; Karacapilidis, N. On a novel representation of multiple textual documents in a single graph. In Proceedings of the International Conference on Intelligent Decision Technologies, Virtual Conference, 17–19 June 2020; Springer: Cham, Switzerland, 2020; pp. 105–115. [Google Scholar]
- Giarelis, N.; Kanakaris, N.; Karacapilidis, N. An innovative graph-based approach to advance feature selection from multiple textual documents. In Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Neos Marmaras, Greece, 5–7 June 2020; pp. 96–106. [Google Scholar]
- Jalil, Z.; Nasir, M.; Alazab, M.; Nasir, J.; Amjad, T.; Alqammaz, A. Grapharizer: A Graph-Based Technique for Extractive Multi-Document Summarization. Electronics 2023, 12, 1895. [Google Scholar] [CrossRef]
- Blloshmi, R.; Conia, S.; Tripodi, R.; Navigli, R. Generating Senses and RoLes: An end-to-end model for dependency-and span-based Semantic Role Labeling. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, Montreal, QC, Canada, 19–27 August 2021; pp. 3786–3793. [Google Scholar]
- Màrquez, L.; Carreras, X.; Litkowski, K.C.; Stevenson, S. Semantic role labeling: An introduction to the special issue. Comput. Linguist. 2008, 34, 145–159. [Google Scholar] [CrossRef]
- Shi, P.; Lin, J. Simple bert models for relation extraction and semantic role labeling. arXiv 2019, arXiv:1904.05255. [Google Scholar]
- Wu, L.; Petroni, F.; Josifoski, M.; Riedel, S.; Zettlemoyer, L. Scalable Zero-shot Entity Linking with Dense Entity Retrieval. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 6397–6407. [Google Scholar]
- Li, J.; Sun, A.; Han, J.; Li, C. A survey on deep learning for named entity recognition. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 3–7 April 2023; Volume 34, pp. 50–70. [Google Scholar]
- Berman, P.; Ramaiyer, V. Improved approximations for the Steiner tree problem. J. Algorithms 1994, 17, 381–408. [Google Scholar] [CrossRef]
Method | Avg. #Sentences | Avg. #Documents | Hit Rate | Overall |
---|---|---|---|---|
Entity + Mention | 341.2 | 257.4 | 78.9% | 43% |
GraphRetrieve | 129.9 | 92.4 | 70.9% | 74% |
GRAAL | 116.3 | 81.1 | 70.2% | 77% |
Method | Avg. Time (s) |
---|---|
Entity + Mention | 4.25 |
GraphRetrieve | 5.71 |
GRAAL | 5.90 |
Claim | #Total Sentences | ||
---|---|---|---|
Entity + Mention | Graph- Retrieve | GRAAL | |
A singer in Got a Girl starred in Final Destination 3 | 48 | 29 | 29 |
Mickey Rooney was in a film based on the novel The Black Stallion by Walter Farley | 459 | 105 | 99 |
Emmy Rossum had a prominent role in a movie of which Maggie Greenwald was the director | 61 | 29 | 26 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mongiovì, M.; Gangemi, A. GRAAL: Graph-Based Retrieval for Collecting Related Passages across Multiple Documents. Information 2024, 15, 318. https://doi.org/10.3390/info15060318
Mongiovì M, Gangemi A. GRAAL: Graph-Based Retrieval for Collecting Related Passages across Multiple Documents. Information. 2024; 15(6):318. https://doi.org/10.3390/info15060318
Chicago/Turabian StyleMongiovì, Misael, and Aldo Gangemi. 2024. "GRAAL: Graph-Based Retrieval for Collecting Related Passages across Multiple Documents" Information 15, no. 6: 318. https://doi.org/10.3390/info15060318
APA StyleMongiovì, M., & Gangemi, A. (2024). GRAAL: Graph-Based Retrieval for Collecting Related Passages across Multiple Documents. Information, 15(6), 318. https://doi.org/10.3390/info15060318