Survey on the Biomedical Text Summarization Techniques with an Emphasis on Databases, Techniques, Semantic Approaches, Classification Techniques, and Similarity Measures
Abstract
:1. Introduction
1.1. Significance and Rationale
1.2. Motivation and Applications
1.3. Objectives
- Study of the significant highly cited biomedical databases, the search filters, and the query strings used in the survey of BTS.
- Study of the popular clinical knowledge sources, ontologies, dictionaries, vocabularies, and their applications in BTS for semantic enrichment of text.
- Listing of the different similarity metrics used in BTS.
- To study recent literary works in the BTS.
- Study of the state-of-the-art literary works in BQA systems to investigate the challenges in the domain.
1.4. Prior Research
- Databases: For appropriate text summarization of biomedical text documents, it is necessary to investigate and explore databases with their applications and structures.
- Semantic Enrichment Approaches: As semantic enrichment plays a very crucial role to obtain contextual relations between text sequences our survey focuses on various approaches for semantic enrichment.
- Text Similarity Metrics: survey focus on commonly used textual similarity metrics in the biomedical domain.
- Text Summarization Techniques and Applications: a comparative analysis of various text summarization systems with an enhanced emphasis on biomedical Question answering systems.
1.5. Research Goal
1.6. Contribution of Work
- This study examines the underlying theories and evolution of automatic biomedical text-summarizing systems by conducting a systematic literature review.
- The analysis of current databases, feature extraction techniques employed, semantic enrichment approaches, text summarization approaches and algorithms, assessment metrics, and challenges are part of the survey.
- Based on the current approaches, question processing, and formulation techniques, passage retrieval and answer processing methods, and datasets, the review compares various existing BQA systems. Furthermore, the limitations of such systems are explained in this work.
- The study concludes with the identification of present issues and challenges in biomedical ATS architectures, as well as future research goals.
- The work culminates in proposing a framework of a biomedical question answering system using the potential of text summarization on the biomedical corpus. The study of research gaps in the discussion section shows the scope for the design of automated BQA with unique features such as heuristics for sentence extraction, Document Screening, and Context-Aware Semantic Enrichment technique.
- As shown in Figure 2 the rest of this paper is organized into distinct segments.
2. Research Method
2.1. Eligibility Criteria and Information Sources
2.2. Keyword Search
3. Detailed Analysis of Literature
3.1. Biomedical Databases
3.2. Semantic Enrichment Approaches
- A.
- Distributional or Corpus-based Approach
- Initially, word vectors are created from the corpus using word co-occurrence.
- concept descriptors are retrieved from an information source such as a word reference or thesaurus, and they can be extended to incorporate descriptor terms from related concepts [68].
- Then in the last step term vectors matched to concept descriptors are aggregated to form context vectors.
- Pretrained: It indicates whether or not the model has been trained on similar tasks (Y/N). These models converge fast because their weights are already optimized and reduce time and effort.
- OOV (Out of Vocabulary): OOV models are richer than non OOV which are the terms encountered in NLP that are not part of the usual lexicon (Y/N).
- Prediction: when building, processing, and validating a model that can be used to predict future occurrences using known results, it is indicated whether it is a predictive model (Y/N).
- Frequency: Based on how frequently certain terms appear in the text or document, it vectorizes the text(Y/N).
- Morphological Information: It investigates and describes the structure of words and their relationships (Y/N).
- Work level: It depicts the various levels at which models, such as embeddings, can be applied to individual words, phrases, paragraphs, or texts.
- Evaluation: It explains the model’s benefits and drawbacks.
- B.
- Knowledge-based Approach
- (1)
- Dictionaries
- (2)
- Lexicons
- (3)
- Ontologies
- 1.
- Dictionaries
- 2.
- Lexicons
- 3.
- Ontologies
- Content: It specifies the types of terms or entities it contains, such as clinical terminology, medications, and morbidity entities, among others.
- Structure: It displays the type of relationship that exists between several terms.
- Classes: These are groups of different ontology concepts.
- Maximum Depth: It displays the hierarchy tree’s maximum depth or tiers.
- Citations: It includes citations to articles that use the relevant ontology.
- C.
- Deep Learning based Approaches
3.3. Text Similarity Metrics
- Length distance
- Distribution distance
- Semantic distance
3.4. Comparative Study of Significant Biomedical Text Summarization
- Data Collection: Collection of text data from various relevant sources.
- Text Data Preprocessing: linguistic techniques utilized to pre-process input text documents, including sentence segmentation, punctuation marks removal, filtering stop-words, stemming [29], etc.
- Feature Extraction: The extraction and representation of sentences is vital for the entire summarization process by discovering topic sentences, essential data traits or attributes within the source document [29].
- Sentence Preparation: Encode and representation of sentences into real-valued vectors for further summarization process.
- Summarization Approach: It is the first and important step in text summarization for choosing the approach [8] to be used. A few strategies include picking the main words and lines from the messages, while others include paraphrasing sentences by condensing original contents.
- Summary: To acquire a superior synopsis of the source record, different calculations, and methods [9] are utilized under different methodologies. It is a stage where sentences are positioned and the high level is picked for incorporation in the synopsis.
3.5. Comparative Study of Popular Works on Biomedical Question Answering Systems
- Question Processing: Performs question analysis and classification to convert it into a search query.
- Document Processing: Query terms are applied to retrieve a group of related documents.
- Passage Retrieval: NLP techniques can be utilized to extract groups of passages.
- Answer Processing: It uses different extraction techniques on the result of the document or passage processing module to present an answer.
4. Discussion
4.1. RQ1. What Are the Various Biomedical Databases Available Online for Automated Biomedical Text Summarization?
4.2. RQ2. What Are the Different Semantic Enrichment Approaches Used in Biomedical Text Summarization and Their Comparative Evaluation?
4.3. RQ3. What Are the Different Text Similarity Metrics Used in Biomedical Text Summarization?
4.4. RQ4. What Are the Different Approaches Used for the Automatic Summarization of Biomedical Text and Their Comparative Analysis?
4.5. RQ5. What Are the Different Approaches Used for Automatic Biomedical Question Answering Systems and Their Comparative Analysis?
- Lack of access to a biomedical text corpus for summarizing data and its application to evidence-based medicine.
- Lack of application of semantic enrichment approach for better context-based BTS.
- Lack of proper heuristics for relevant document screening of biomedical text.
5. Proposed System
- Heuristics for sentence extraction
- Document Screening
- Context-Aware Semantic Enrichment
- Initially text will be converted to lowercase and split into separate words.
- Stemming to convert word to its original form.
- Lemmatization to convert a word to its meaningful base form.
- Removal of stop words using NLTK library.
- Normalization to convert text into standard form.
6. Limitations
7. Conclusions and Future Work
Author Contributions
Funding
Conflicts of Interest
References
- Mishra, R.; Bian, J.; Fiszman, M.; Weir, C.R.; Jonnalagadda, S.; Mostafa, J.; Del Fiol, G. Text summarization in the biomedical domain: A systematic review of recent research. J. Biomed. Inform. 2014, 52, 457–467. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Afantenos, S.; Karkaletsis, V.; Stamatopoulos, P. Summarization from medical documents: A survey. Artif. Intell. Med. 2005, 33, 157–177. [Google Scholar] [CrossRef] [Green Version]
- Moradi, M.; Ghadiri, N. Text Summarization in the Biomedical Domain. arXiv 2019, arXiv:1908.02285. [Google Scholar] [CrossRef]
- Wang, M.; Wang, M.; Yu, F.; Yang, Y.; Walker, J.; Mostafa, J. A systematic review of automatic text summarization for biomedical literature and EHRs. J. Am. Med. Inform. Assoc. 2021, 28, 2287–2297. [Google Scholar] [CrossRef]
- Chaves, A.; Kesiku, C.; Garcia-Zapirain, B. Automatic Text Summarization of Biomedical Text Data: A Systematic Review. Information 2022, 13, 393. [Google Scholar] [CrossRef]
- Moradi, M. Small-world networks for summarization of biomedical articles. arXiv 2019, arXiv:1903.02861. [Google Scholar]
- Moradi, M.; Dashti, M.; Samwald, M. Summarization of biomedical articles using domain-specific word embeddings and graph ranking. J. Biomed. Inform. 2020, 107, 103452. [Google Scholar] [CrossRef]
- Mridha, M.F.; Lima, A.A.; Nur, K.; Das, S.C.; Hasan, M.; Kabir, M.M. A Survey of Automatic Text Summarization: Progress, Process and Challenges. IEEE Access 2021, 9, 156043–156070. [Google Scholar] [CrossRef]
- Awasthi, I.; Gupta, K. Natural Language Processing (NLP) based Text Summarization—A Survey. In Proceedings of the 2021 6th International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 20–22 January 2021; ISBN 978-1-7281-8501-9. [Google Scholar]
- Manish, S.; Disha, M. Techniques and Research in Text Summarization—A Survey. In Proceedings of the 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 4–5 March 2021. [Google Scholar]
- Gulden, C.; Kirchner, M.; Schüttler, C.; Hinderer, M.; Kampf, M.; Prokosch, H.-U.; Toddenroth, D. Extractive summarization of clinical trial descriptions. Int. J. Med. Inform. 2019, 129, 114–121. [Google Scholar] [CrossRef] [PubMed]
- Alsentzer, E. Extractive Summarization of EHR Discharge Notes. arXiv 2018, arXiv:1810.12085v1. [Google Scholar]
- Kaur, M.; Mollá, D. Supervised Machine Learning for Extractive Query Based Summarisation of Biomedical Data. In Proceedings of the 9th International Workshop on Health Text Mining and Information Analysis (LOUHI 2018), Brussels, Belgium, 31 October 2018; pp. 29–37. [Google Scholar]
- Fiszman, M.; Rindflesch, T.C.; Kilicoglu, H. Summarizing drug information in Medline citations. AMIA Annu. Symp. Proc. 2006, 2006, 254–258. [Google Scholar]
- Sackett, D.L. Evidence-based medicine. In Seminars in Perinatology; Elsevier: Amsterdam, The Netherlands, 1997; Volume 21, pp. 3–5. [Google Scholar]
- Mollá, D.; Santiago-Martínez, M.E.; Sarker, A.; Paris, C. A corpus for research in text processing for evidence-based medicine. In Language Resources and Evaluation; Springer Science & Business Media: Dordrecht, The Netherlands, 2015. [Google Scholar] [CrossRef]
- Hassanzadeh, H.; Groza, T.; Hunter, J. Identifying scientific artefacts in biomedical literature: The evidence-based medicine use case. J. Biomed. Inform. 2014, 49, 159–170. [Google Scholar] [CrossRef]
- Kanwal, N.; Rizzo, G. Attention-based Clinical Note Summarization. arXiv 2021, arXiv:2104.08942v2. [Google Scholar]
- Masic, I. Review of Most Important Biomedical Databases for Searching of Biomedical Scientific Literature. Donald Sch. J. Ultrasound Obstet. Gynecol. 2012, 6, 343–361. [Google Scholar] [CrossRef]
- Johnson, A.E.W.; Pollard, T.J.; Shen, L.; Lehman, L.-W.H.; Feng, M.; Ghassemi, M.; Moody, B.; Szolovits, P.; Celi, L.A.; Mark, R.G. MIMIC-III, a freely accessible critical care database. Sci. Data 2016, 3, 160035. [Google Scholar] [CrossRef] [Green Version]
- Available online: https://pubmed.ncbi.nlm.nih.gov/ (accessed on 26 December 2022).
- Available online: https://www.ncbi.nlm.nih.gov/pmc/about/intro/ (accessed on 26 December 2022).
- Available online: https://www.nlm.nih.gov/medline/medline_overview.html (accessed on 26 December 2022).
- Available online: https://www.elsevier.com/en-in/about (accessed on 26 December 2022).
- Available online: https://www.cochranelibrary.com/about/about-cochrane-library (accessed on 26 December 2022).
- Available online: https://www.ebsco.com/products/research-databases/cinahl-database (accessed on 26 December 2022).
- Available online: https://physionet.org/about/ (accessed on 26 December 2022).
- Available online: https://pcornet.org/about/ (accessed on 26 December 2022).
- Feldman, R.; Sanger, J. The Text Mining Handbook. Advanced Approaches in Analysing Unstructured Data; Cambridge University Press: New York, NY, USA, 2007; pp. 13–19. [Google Scholar]
- Singh, A.; Sharma, A.; Rajput, S.; Bose, A.; Hu, X. An investigation on hybrid particle swarm optimization algorithms for parameter optimization of PV cells. Electronics 2022, 11, 909. [Google Scholar] [CrossRef]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2019, 36, 1234–1240. [Google Scholar] [CrossRef] [Green Version]
- Pennington, J.; Socher, R.; Manning, C.D. GloVe: Global vectors for word representation. In Proceedings of the EMNLP 2014—2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Hliaoutakis, A.; Varelas, G.; Voutsakis, E.; Petrakis, E.; Milios, E. Information retrieval by semantic similarity. Int. J. Seman. Web Inf. Syst. 2006, 2, 55–73. [Google Scholar] [CrossRef] [Green Version]
- Carbonell, J.; Goldstein, J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACMSIGIR Conference on Research and Development in Information Retrieval—SIGIR ’98, Melbourne, Australia, 24–28 August 1998; pp. 335–336. [Google Scholar]
- Sarrouti, M.; El Alaoui, S.O. A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering. J. Biomed. Inform. 2017, 68, 96–103. [Google Scholar] [CrossRef] [PubMed]
- Sarker, A.; Mollá, D.; Paris, C. Query-oriented evidence extraction to support evidence-based medicine practice. J. Biomed. Inform. 2016, 59, 169–184. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jin, D.; Szolovits, P. PICO Element Detection in Medical Text via Deep Neural Networks. In Proceedings of the BioNLP 2018 Workshop, Melbourne, Australia, 24 July 2018. [Google Scholar]
- Mutabazi, E.; Ni, J.; Tang, G.; Cao, W. A Review on Medical Textual Question Answering Systems Based on Deep Learning Approaches. Appl. Sci. 2021, 11, 5456. [Google Scholar] [CrossRef]
- Jin, Q.; Yuan, Z.; Xiong, G.; Yu, Q.; Ying, H.; Tan, C.; Chen, M.; Huang, S.; Liu, X.; Yu, S. Biomedical Question Answering: A Survey of Approaches and Challenges. ACM Comput. Surv. 2022, 55, 1–36. [Google Scholar] [CrossRef]
- Kaddari, Z.; Mellah, Y. Biomedical Question Answering: A Survey of Methods and Datasets. In Proceedings of the 2020 Fourth International Conference On Intelligent Computing in Data Sciences (ICDS), Fez, Morocco, 21–23 October 2020. [Google Scholar] [CrossRef]
- Jin, Q.; Yuan, Z.; Xiong, G.; Yu, Q.; Tan, C.; Chen, M.; Huang, S.; Liu, X.; Yu, S. Biomedical Question Answering: A Comprehensive Review. arXiv 2021, arXiv:2102.05281. [Google Scholar]
- Soares, M.A.C.; Parreiras, F.S. A literature review on question answering techniques, paradigms and systems. J. King Saud Univ. Comput. Inf. Sci. 2020, 32, 635–646. [Google Scholar]
- Kitchenham, B. Guidelines for performing Systematic Literature Reviews in software engineering. Engineering 2007, 45, 1051. [Google Scholar]
- Masic, I. How to Search, Write, Prepare and Publish the Scientific Papers in the Biomedical Journals. Acta Inform. Med. 2011, 19, 68–79. [Google Scholar] [CrossRef] [PubMed]
- Jin, D.; Pan, E.; Oufattole, N.; Weng, W.-H.; Fang, H.; Szolovits, P. What disease does this patient have? A large-scale open domain question answering dataset from medical exams. Appl. Sci. 2020, 11, 6421. [Google Scholar] [CrossRef]
- Available online: https://www.tripdatabase.com/ (accessed on 26 December 2022).
- Available online: https://www.biomedcentral.com/about (accessed on 26 December 2022).
- Available online: https://www.embase.com/landing?status=grey (accessed on 26 December 2022).
- Available online: https://www.ebsco.com/products/research-databases/allied-and-complementary-medicine-database-amed (accessed on 26 December 2022).
- Available online: https://seer.cancer.gov/ (accessed on 26 December 2022).
- Available online: https://bioportal.bioontology.org/ (accessed on 26 December 2022).
- Alam, F.; Afzal, M.; Malik, K.M. Comparative Analysis of Semantic Similarity Techniques for Medical Text. In Proceedings of the 2020 International Conference on Information Networking (ICOIN), Barcelona, Spain, 7–10 January 2020. [Google Scholar]
- McInnes, B.T.; Pedersen, T. Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text. J. Biomed. Inform. 2013, 46, 1116–1124. [Google Scholar] [CrossRef] [Green Version]
- Patwardhan, S.; Banerjee, S.; Pedersen, T. Using measures of semantic relatedness for word sense disambiguation. In The 4th International Conference on Computational Linguistics and Intelligent Text Processing; Springer: Berlin/Heidelberg, Germany, 2003; pp. 241–257. [Google Scholar]
- Sanchez, D. Domain Ontology Learning from the Web: An Unsupervised, Automatic and Domain Independent Approach; Akademiker: Catalonia, Spain, 2012. [Google Scholar]
- Gøeg, K.R.; Cornet, R.; Andersen, S.K. Clustering clinical models from local electronic health records based on semantic similarity. J. Biomed. Inform. 2015, 54, 294–304. [Google Scholar] [CrossRef] [Green Version]
- Shanavas, N.; Wang, H.; Lin, Z.; Hawe, G. Knowledge-driven graph similarity for text classification. Int. J. Mach. Learn. Cybern. 2021, 12, 1067–1081. [Google Scholar] [CrossRef]
- Weng, W.-H.; Chung, Y.-A.; Tong, S. Clinical Text Summarization with Syntax-Based Negation and Semantic Concept Identification. arXiv 2020, arXiv:2003.00353. [Google Scholar]
- Sugumaran, V.; Storey, V.C. Ontologies for conceptual modeling: Their creation, use, and management. Data Knowl. Eng. 2002, 42, 251–271. [Google Scholar] [CrossRef]
- McInnes, B.T.; Pedersen, T. Evaluating semantic similarity and relatedness over the semantic grouping of clinical term pairs. J. Biomed. Inform. 2015, 54, 329–336. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sammut, C.; Webb, G.I. (Eds.) Encyclopedia of Machine Learning; Springer: Boston, MA, USA, 2011. [Google Scholar]
- Jaccard, P. Nouvelles recherches sur la distribution florale. Bull. Soc. Vaud. Sci. Nat. 1908, 44, 223–270. [Google Scholar]
- Cai, R.; Zhu, B.; Ji, L.; Hao, T.; Yan, J.; Liu, W. An CNN-LSTM Attention Approach to Understanding User Query Intent from Online Health Communities. In Proceedings of the 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, USA, 18–21 November 2017. [Google Scholar]
- Sarrouti, M.; El Alaoui, S.O. SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions. Artif. Intell. Med. 2020, 102, 101767. [Google Scholar] [CrossRef]
- Afzal, M.; Alam, F.; Malik, K.M.; Malik, G.M. Clinical Context–Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation. J. Med. Internet Res. 2020, 22, e19810. [Google Scholar] [CrossRef]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Khattak, F.K.; Jeblee, S.; Pou-Prom, C.; Abdalla, M.; Meaney, C.; Rudzicz, F. A survey of word embeddings for clinical text. J. Biomed. Inform. 2019, 100, 100057. [Google Scholar] [CrossRef]
- Resnik, P. Using Information Content to Evaluate Semantic Similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence. Available online: https://arxiv.org/abs/cmp-lg/9511007 (accessed on 26 December 2022).
- Le, Q.; Mikolov, T. Distributed representations of sentences and documents. Int. Conf. Mach. Learn. 2014, 32, 1188–1196. [Google Scholar]
- Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 2017, 5, 135–146. [Google Scholar] [CrossRef] [Green Version]
- National Library of Medicine. UMLS Meta Thesaurus Fact Sheet. Available online: http://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html (accessed on 18 May 2016).
- Boguraev, B.; Briscoe, T.; Carroll, J.; Carter, D.; Grover, C. The derivation of a grammatically indexed lexicon from the Longman Dictionary of Contemporary English. In Proceedings of the 25th conference on Association for Computational Linguistics, Stanford, CA, USA, 6–9 July 1987; pp. 193–200. [Google Scholar]
- National Library of Medicine. UMLS Specialist Lexicon Fact Sheet. Available online: http://www.nlm.nih.gov/pubs/factsheets/umlslex.html (accessed on 18 May 2016).
- Bada, M. Mapping of biomedical text to concepts of lexicons, terminologies, and ontologies. Methods Mol. Biol. 2014, 1159, 33–45. [Google Scholar] [CrossRef] [PubMed]
- Sánchez, D.; Batet, M. Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective. J. Biomed. Inform. 2011, 44, 749–759. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Batet, M.; Sánchez, D.; Valls, A.; Gibert, K. Semantic similarity estimation from multiple ontologies. Appl. Intell. 2013, 38, 29–44. [Google Scholar] [CrossRef]
- Jiang, R.; Gan, M.; Dou, X. From ontology to semantic similarity: Calculation of ontology-based semantic similarity. Sci. World J. 2013, 2013, 793091. [Google Scholar]
- SNOMED International. SNOMED—Home—SNOMED International. 2019. Available online: http://www.snomed.org/ (accessed on 6 November 2019).
- Available online: https://bioportal.bioontology.org/ontologies/RCD (accessed on 26 December 2022).
- Available online: https://bioportal.bioontology.org/ontologies/NDFRT (accessed on 26 December 2022).
- Available online: https://bioportal.bioontology.org/ontologies/ICD10 (accessed on 26 December 2022).
- Available online: https://www.ncbi.nlm.nih.gov/mesh (accessed on 26 December 2022).
- MedDRA MSSO—MedDRA. Available online: https://www.meddra.org/about-meddra/organisation/msso (accessed on 6 November 2019).
- Cai, X.; Liu, S.; Yang, L.; Lu, Y.; Zhao, J.; Shen, D.; Liu, T. COVIDSum: A linguistically enriched SciBERT-based summarization model for COVID-19 scientific papers. J. Biomed. Inform. 2022, 127, 103999. [Google Scholar] [CrossRef]
- Wehrli, E. Fips, a deep linguistic multilingual parser. In Proceedings of the ACL Workshop on Deep Linguistic Processing, Prague, Czech Republic, 28 June 2007; pp. 120–127. [Google Scholar]
- Noh, J.; Kavuluru, R. Document retrieval for biomedical question answering with neural sentence matching. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 194–201. [Google Scholar]
- Moradi, M.; Samwald, M. Clustering of Deep Contextualized Representations for Summarization of Biomedical Texts. arXiv 2019, arXiv:1908.02286. [Google Scholar]
- Beltagy, I.; Lo, K.; Cohan, A. Scibert: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 9 November 2019; pp. 3615–3620. [Google Scholar]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version ofBERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
- Wang, J.; Dong, Y. Measurement of Text Similarity: A Survey. Information 2020, 11, 421. [Google Scholar] [CrossRef]
- Ben Aouicha, M.; Taieb, M.A.H. Computing semantic similarity between biomedical concepts using new information content approach. J. Biomed. Inform. 2016, 59, 258–275. [Google Scholar] [CrossRef] [Green Version]
- Han, M.; Zhang, X.; Yuan, X.; Jiang, J.; Yun, W.; Gao, C. A Survey on the Techniques, Applications, and Performance of Short Text Semantic Similarity; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2020. [Google Scholar]
- Cajiao, A.Z.; Mateus, A.R. Graph-based Similarity for Document Retrieval in the Biomedical Domain. In Proceedings of the 2022 7th International Conference on Machine Learning Technologies (ICMLT), Rome Italy, 11–13 March 2022. [Google Scholar]
- Chen, X.; Jia, S.; Xiang, Y. A review: Knowledge reasoning over knowledge graph. Expert Syst. Appl. 2020, 141, 112948. [Google Scholar] [CrossRef]
- Plaza, L.; Díaz, A.; Gervás, P. A semantic graph-based approach to biomedical summarisation. Artif. Intell. Med. 2011, 53, 1–14. [Google Scholar] [CrossRef] [PubMed]
- Deza, M.M.; Deza, E. Encyclopedia of distances. In Encyclopedia of Distances; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–583. [Google Scholar]
- Jaccard, P. The distribution of the flora in the alpine zone. New Phytol. 1912, 11, 37–50. [Google Scholar] [CrossRef]
- Andoni, A.; Indyk, P.; Krauthgamer, R. Earth mover distance over high-dimensional spaces. In Proceedings of the Symposium on Discrete Algorithms, San Francisco, CA, USA, 20–22 January 2008; pp. 343–352. [Google Scholar]
- Manning, C.D.; Schütze, H. Foundations of Statistical Natural Language Processing; MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Iliopoulos, C.S.; Rahman, M.S. New efficient algorithms for the LCS and constrained LCS problems. Inf. Process. Lett. 2008, 106, 13–18. [Google Scholar] [CrossRef]
- Dice, L.R. Measures of the amount of ecologic association between species. Ecology 1945, 26, 297–302. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Landauer, T.K.; Dumais, S.T. A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 1997, 104, 211–240. [Google Scholar] [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Sak, H.; Senior, A.; Beaufays, F. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. arXiv 2014, arXiv:1402.1128. [Google Scholar]
- Li, Y.; Bandar, Z.A.; Mclean, D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl. Data Eng. 2003, 15, 871–882. [Google Scholar]
- Lin, C.-Y. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out; Association for Computational Linguistics: Barcelona, Spain, 2004; pp. 74–81. [Google Scholar]
- Schulze, F.; Neves, M. Entity-Supported Summarization of Biomedical Abstracts. In Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016), Osaka, Japan, 11–16 December 2016. [Google Scholar]
- Aramaki, E.; Miura, Y.; Tonoike, M.; Ohkuma, T.; Mashuichi, H.; Ohe, K. Text2table: Medical text summarization system based on named entity recognition and modality identification. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, Boulder, CO, USA, 4–5 June 2009; pp. 185–192. [Google Scholar]
- Moradi, M.; Ghadiri, N. Quantifying the informativeness for biomedical literature summarization: An itemset mining method. Comput. Methods Programs Biomed. 2017, 146, 77–89. [Google Scholar] [CrossRef] [Green Version]
- Agrawal, R.; Imieliński, T.; Swami, A. Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 1993, 22, 207–216. [Google Scholar] [CrossRef]
- Agrawal, R.; Mannila, H.; Srikant, R.; Toivonen, H.; Verkamo, A.I. Fast Discovery of Association Rules. Adv. Knowl. Discov. Data Min. 1996, 12, 307–328. [Google Scholar]
- Moradi, M.; Ghadiri, N. Different approaches for identifying important concepts in probabilistic biomedical text summarization. Artif. Intell. Med. 2018, 84, 101–116. [Google Scholar] [CrossRef] [Green Version]
- Balinsky, A.; Balinsky, H.; Simske, S. On the Helmholtz Principle for Data Mining; Hewlett-Packard Development Company, LP.: Palo Alto, CA, USA, 2011. [Google Scholar]
- Azadani, M.N.; Ghadiri, N.; Davoodijam, E. Graph-based biomedical text summarization: An itemset mining and sentence clustering approach. J. Biomed. Inform. 2018, 84, 42–58. [Google Scholar] [CrossRef]
- Zhang, W.; Yoshilda, T.; Tang, X.; Wang, Q. Text clustering using frequent itemsets. Knowl.-Based Syst. 2010, 23, 379–388. [Google Scholar] [CrossRef]
- Moradi, M. Frequent Itemsets as Meaningful Events in Graphs for Summarizing Biomedical Texts. In Proceedings of the 2018 8th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, 25–26 October 2018; pp. 135–140. [Google Scholar]
- Balinsky, H.; Balinsky, A.; Simske, S.J. Automatic text summarization and small-world networks. In Proceedings of the 11th ACM Symposium on Document Engineering, Mountain View, CA, USA, 19–22 September 2011; pp. 175–184. [Google Scholar]
- Moradi, M. CIBS: A biomedical text summarizer using topic-based sentence clustering. J. Biomed. Inform. 2018, 88, 53–61. [Google Scholar] [CrossRef]
- Larose, D.T. Discovering Knowledge in Data: An Introduction to Data Mining; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
- Rouane, O.; Belhadef, H.; Bouakkaz, M. Combine clustering and frequent itemsets mining to enhance biomedical text summarization. Expert Syst. Appl. 2019, 135, 362–373. [Google Scholar] [CrossRef]
- Salton, G.; Wong, A.; Yang, C.S. A vector space model for automatic indexing. Commun. ACM 1975, 18, 613–620. [Google Scholar] [CrossRef] [Green Version]
- Macqueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; University of California: Los Angeles, CA, USA, 1967; pp. 281–297. [Google Scholar]
- Lee, E.K.; Uppal, K. CERC: An interactive content extraction, recognition, and construction tool for clinical and biomedical text. In Proceedings of the 10th International Workshop on Biomedical and Health Informatics, San Diego, CA, USA, 18–20 November 2019. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Apache Lucene. Available online: http://lucene.apache.org (accessed on 26 December 2022).
- Bada, M.; Eckert, M.; Evans, D.; Garcia, K.; Shipley, K.; Sitnikov, D.; Baumgartner, W.A.; Cohen, K.B.; Verspoor, K.; Blake, J.A.; et al. Concept annotation in the CRAFT corpus. BMC Bioinform. 2012, 9, 161. [Google Scholar] [CrossRef] [Green Version]
- Savova, G.K.; Masanz, J.J.; Ogren, P.V.; Zheng, J.; Sohn, S.; Kipper-Schuler, K.C.; Chute, C.G. Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 2010, 17, 507–513. [Google Scholar] [CrossRef] [Green Version]
- Rouane, O. Word Embedding-Based Biomedical Text Summarization. In Emerging Trends in Intelligent Computing and Informatics, Proceedings of the 4th International Conference of Reliable Information and Communication Technology (IRICT2019), Johor, Malaysia, 22–23 September 2019; Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
- Text Data Preprocessing. Keras. Available online: https://keras.io/preprocessing/text/ (accessed on 7 October 2020).
- Sarker, A.; Yang, Y.-C.; Al-Garadi, M.A.; Abbas, A. A Light-Weight Text Summarization System for Fast Access to Medical Evidence. Front. Digit. Health 2020, 2. [Google Scholar] [CrossRef]
- Davoodijam, E.; Ghadiri, N.; Shahreza, M.L.; Rinaldi, F. MultiGBS: A multi-layer graph approach to biomedical summarization. J. Biomed. Inform. 2021, 116, 103706. [Google Scholar] [CrossRef]
- MetaMap—A Tool for Recognizing UMLS Concepts in Text. Available online: https://metamap.nlm.nih.gov/ (accessed on 25 April 2019).
- Basaldella, M.; Furrer, L.; Tasso, C.; Rinaldi, F. Entity recognition in the biomedical domain using a hybrid approach. J. Biomed. Semant. 2017, 8, 51. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Rindflesch, T.; Fiszman, M. The interaction of domain knowledge and linguistic structure in natural language processing: Interpreting hypernymic propositions in biomedical text. J. Biomed. Inform. 2003, 36, 462–477. [Google Scholar] [CrossRef] [Green Version]
- Rahmede, C.; Iacovacci, J.; Arenas, A.; Bianconi, G. Centralities of nodes and influences of layers in large multiplex networks. J. Complex Netw. 2018, 6, 733–752. [Google Scholar] [CrossRef] [Green Version]
- Zahid, M.A.H.; Mittal, A.; Joshi, R.; Atluri, G. CLINIQA: A Machine Intelligence Based Clinical Question Answering System. arXiv 2006, arXiv:1805.05927. [Google Scholar]
- Lin, R.T.; Chiu, J.L.-T.; Dai, H.-J.; Day, M.-Y.; Tsai, R.T.-H.; Hsu, W.-L. Biological question answering with syntactic and semantic feature matching and an improved mean reciprocal ranking measurement. In Proceedings of the 2008 IEEE International Conference on Information Reuse and Integration, Las Vegas, NV, USA, 13–15 July 2008; pp. 184–189. [Google Scholar]
- Kogan, Y.; Collier, N.; Pakhomov, S.; Krauthammer, M. Towards Semantic Role Labeling & IE in the Medical Literature. AMIA Annu. Symp. Proc. 2005, 2005, 410–414. [Google Scholar]
- Miller, G.A.; Beckwith, R.; Fellbaum, C.; Gross, D.; Miller, K.J. Introduction to WordNet: An On-line Lexical Database*. Int. J. Lexicogr. 2004, 3, 235–244. [Google Scholar] [CrossRef] [Green Version]
- Gobeill, J.; Patsche, E.; Theodoro, D.; Veuthey, A.-L.; Lovis, C.; Ruch, P. Question answering for biology and medicine. In Proceedings of the 2009 9th International Conference on Information Technology and Applications in Biomedicine, Larnaka, Cyprus, 4–7 November 2009; pp. 1–5. [Google Scholar]
- Cao, Y.; Liu, F.; Simpson, P.; Antieau, L.D.; Bennett, A.S.; Cimino, J.; Ely, J.; Yu, H. AskHERMES: An online question answering system for complex clinical questions. J. Biomed. Inform. 2011, 44, 277–288. [Google Scholar] [CrossRef] [Green Version]
- Robertson, S.; Zaragoza, H.; Taylor, M. Simple BM25 extension to multiple weighted fields. In Proceedings of the thirteenth ACM International Conference on Information and Knowledge Management, Washington, DC, USA, 8–13 November 2004. [Google Scholar]
- Cairns, B.L.; Nielsen, R.D.; Masanz, J.J.; Martin, J.H.; Palmer, M.S.; Ward, W.H.; Savova, G.K. The mipacq clinical question answering system. In AMIA Annual Symposium Proceedings; American Medical Informatics Association: Bethesda, MD, USA, 2011; Volume 2011, p. 171. [Google Scholar]
- Ely, J.; Osheroff, J.; Chambliss, M.; Ebell, M.; Rosenbaum, M. Answering Physicians’ Clinical Questions: Obstacles and Potential Solutions. J. Am. Med. Inform. Assoc. 2005, 12, 217–224. [Google Scholar] [CrossRef]
- Medpedia. Available online: http://www.medpedia.com/ (accessed on 26 December 2022).
- Ni, Y.; Zhu, H.; Cai, P.; Zhang, L.; Qui, Z.; Cao, F. CliniQA: Highly Reliable Clinical Question Answering System. Stud. Health Technol. Inform. 2012, 180, 215–219. [Google Scholar]
- Available online: www.tripanswers.org (accessed on 26 December 2022).
- Athenikos, S.J.; Han, H.; Brooks, A.D. A Framework of a Logic-based Question-Answering System for the Medical Domain (LOQAS-Med). In Proceedings of the 2009 ACM symposium on Applied Computing, Honolulu, HI, USA, 8 March 2009. [Google Scholar] [CrossRef]
- NLM Clinical Questions Collection. Available online: http://clinques.nlm.nih.gov/ (accessed on 26 December 2022).
- Abacha, A.B.; Zweigenbaum, P. MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies. Inf. Process. Manag. 2015, 51, 570–594. [Google Scholar] [CrossRef]
- Balikas, G.; Krithara, A.; Partalas, I.; Paliouras, G. BioASQ: A challenge on large-scale biomedical semantic indexing and question answering. In International Workshop on Multimodal Retrieval in the Medical Domain; Springer: Cham, Switzerland, 2015. [Google Scholar]
- Peng, S.; You, R.; Wang, H.; Zhai, C.; Mamitsuka, H.; Zhu, S. Deepmesh: Deep semantic representation for improving large-scale mesh indexing. Bioinformatics 2016, 32, i70–i79. [Google Scholar] [CrossRef] [Green Version]
- Manning, C.D.; Surdeanu, M.; Bauer, J.; Finkel, J.; Bethard, S.J.; McClosky, D. The stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, MD, USA, 23–24 June 2014. [Google Scholar] [CrossRef] [Green Version]
- Xie, W.; Ding, R.; Yan, J.; Qu, Y. A Mobile-Based Question-Answering and Early Warning System for Assisting Diabetes Management. Wirel. Commun. Mob. Comput. 2018, 2018, 9163160. [Google Scholar] [CrossRef]
- Zhang, X.; Wu, J.; He, Z.; Liu, X.; Su, Y. Medical Exam Question Answering with Large-Scale Reading Comprehension. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Zhu, X.; Yang, X.; Chen, H. A Biomedical Question Answering System Based on SNOMED-CT. In Proceedings of the International Conference on Knowledge Science, Engineering and Management, Changchun, China, 17–19 August 2018. [Google Scholar]
- Ferrández, Ó.; Micol, D.; Muñoz, R.; Palomar, M. DLSITE-1: Lexical Analysis for Solving Textual Entailment Recognition; Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; Volume 4592, pp. 284–294. [Google Scholar] [CrossRef] [Green Version]
- Brokos, G.I.; Liosis, P.; McDonald, R.; Pappas, D.; Androutsopoulos, I. AUEB at BioASQ 6: Document and Snippet Retrieval. arXiv 2018, arXiv:1809.0636. [Google Scholar]
- Hui, K.; Yates, A.; Berberich, K.; de Melo, G. PACRR: A position-aware neural IR model for relevance matching. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9–11 September 2017; pp. 1049–1058. [Google Scholar]
- Guo, J.; Fan, Y.; Ai, Q.; Croft, W.B. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management, Indianapolis, IN, USA, 24–28 October 2016; pp. 55–64. [Google Scholar]
- Yin, W.; Schutze, H.; Xiang, B.; Zhou, B. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Linguist. 2016, 4, 259–272. [Google Scholar] [CrossRef]
- Metzler, D.; Croft, W.B. A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference. ACM, Salvador, Brazil, 15–19 August 2005; pp. 472–479. [Google Scholar]
- Sarrouti, M.; Alaoui, S.O.E. A machine learning-based method for question type classification in biomedical question answering. Methods Inf. Med. 2017, 56, 209–216. [Google Scholar] [CrossRef]
- Ozyurt, I.B.; Bandrowski, A.; Grethe, J.S. Bio-AnswerFinder: A system to find answers to questions from biomedical texts. Database 2020, 2020, baz137. [Google Scholar] [CrossRef] [Green Version]
- Yan, Y.; Zhang, B.; Li, X.; Liu, Z. List-wise learning to rank biomedical question-answer pairs with deep ranking recursive autoencoders. PLoS ONE 2020, 15, e0242061. [Google Scholar] [CrossRef] [PubMed]
- Dina, D.F.; Yassine, M.; Asma, B.A. Consumer health information and question answering: Helping consumers find answers to their health-related information needs. J. Am. Med. Inform. Assoc. 2020, 27, 194–201. [Google Scholar]
- Almeida, T.; Matos, S. Calling Attention to Passages for Biomedical Question Answering; Springer Nature: Cham, Switzerland, 2020; pp. 69–77. [Google Scholar]
- McDonald, R.; Brokos, G.I.; Androutsopoulos, I. Deep Relevance Ranking Using Enhanced Document-Query Interactions. arXiv 2018, arXiv:1809.01682. [Google Scholar]
- Alzubi, J.A.; Jain, R.; Singh, A.; Parwekar, P.; Gupta, M. COBERT: COVID-19 Question Answering System Using BERT. Arab. J. Sci. Eng. 2021, 1–11. [Google Scholar] [CrossRef] [PubMed]
- Available online: https://www.kaggle.com/allen-institutefor-ai/CORD-19-research-challenge (accessed on 26 December 2022).
- Liang, J.; Tsou, C.-H. A Novel System for Extractive Clinical Note Summarization using EHR Data. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, Minneapolis, MN, USA, 7 June 2019; pp. 46–54. [Google Scholar]
- Gupta, S.; Sharaff, A.; Nagwani, N.K. Biomedical Text Summarization: A Graph-Based Ranking Approach; Advances in Intelligent Systems and Computing; Springer: Singapore, 2021; Volume 1354. [Google Scholar]
- Gupta, S.; Sharaff, A. Frequent item-set mining and clustering based ranked biomedical text summarization. J. Supercomput. 2022, 79, 139–159. [Google Scholar] [CrossRef]
- Erkan, G.; Radev, D.R. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. J. Artif. Intell. Res. 2004, 22, 457–479. [Google Scholar] [CrossRef] [Green Version]
- Debnath, P.; Castillo, O.; Kumam, P. (Eds.) Soft Computing: Recent Advances and Applications in Engineering and Mathematical Sciences; CRC Press: Boca Raton, FL, USA, 2023. [Google Scholar]
RQ No. | Research Question (RQ) | Objective/Discussion |
---|---|---|
RQ1 | What are the various biomedical databases available online for automatic biomedical text summarization? | For appropriate text summarization of biomedical documents, it is necessary to investigate and explore databases, their application, structure, and query techniques. |
RQ2 | What are the different Semantic Enrichment Approaches used in biomedical text summarization and their comparative evaluation? | This area must be in order to determine the significance of their application in summarization techniques. |
RQ3 | What are the different similarity metrics used in biomedical text summarization? | A comprehensive analysis of existing textual similarity measures that can be used in biomedical text-summarizing systems is carried out |
RQ4 | What are the various approaches for automatically summarizing biomedical text, and how are they compared? | A systematic review was conducted with a comparative study of existing systems, taking into account the techniques, feature extraction methods employed, and performance in the form of accuracy. |
RQ5 | What are the different approaches used for automatic biomedical QA systems and their comparative analysis? | The BQA system is one of the most prevalent and significant applications of the BTS system. As a result, a comparative analysis of various biomedical Question answering systems was necessary, taking into account significant factors |
Topic of Study | Inclusion Criteria | Exclusion Criteria |
---|---|---|
Biomedical Databases | The work must refer to highly cited biomedical or clinical databases | Papers that used databases other than biomedical or clinical domains. |
Semantic Enrichment | The work must focus on highly cited semantic similarity approaches to compute the similarity between biomedical terms or text using available biomedical knowledge sources. | The papers which are presented applied semantic similarity approaches for non-biomedical domains are excluded |
Text Similarity Metric | The work must focus on the most popularly used text similarity metrics in the biomedical domain | The papers focus on metrics used in domains other than the biomedical or clinical domain |
Biomedical Text Summarization | The work must refer to various techniques used for automatic text summarization in the field of biomedical or clinical domains. | The automatic text summarization methods are applied to non-biomedical documents. The work focuses on the automatic summarization of inputs other than text such as video summarization, dialog summarization |
Biomedical Question Answering System | The work must focus on question- answering systems in the biomedical or clinical domain with various techniques applied. | The papers presented on non-biomedical QA systems. |
Database | Query Executed |
---|---|
SCOPUS | TITLE-ABS KEY ((“biomedical text” OR “document” OR “Biomedical” OR “clinical notes” OR “biomedical domain” OR “biomedical literature” OR “clinical” OR “medical” OR “medical records” OR “clinical records” OR “semantic similarity”) AND (“summarization” OR “text summarization” OR “summary” OR “patient” OR “EHR”) AND (“passage retrieval” OR “question answering OR “graph-based” OR “machine learning” OR “transformer based” OR “Evidence-based Medicine(EBM)” OR” deep learning” OR “databases” OR “knowledge base” OR “knowledge sources” OR “metric” OR “measure”)) |
Web of Science | TOPIC ((“biomedical text” OR “document” OR “Biomedical” OR “clinical notes” OR “biomedical domain” OR “biomedical literature” OR “clinical” OR “medical” OR “medical records” OR “clinical records” OR “semantic similarity”) AND (“summarization” OR “text summarization” OR “summary” OR “patient” OR “EHR”) AND (“passage retrieval” OR “question answering OR “graph-based” OR “machine learning” OR “transformer based” OR “deep learning” OR “Evidence-based Medicine(EBM)” OR “databases” OR “knowledge base” OR “knowledge sources” OR “metric” OR “measure”)) |
IEEE | ((“biomedical text” OR “document” OR “Biomedical” OR “clinical notes” OR “biomedical domain” OR “biomedical literature” OR “clinical” OR “medical” OR “medical records” OR “clinical records” OR “semantic similarity”) AND (“summarization” OR “text summarization” OR “summary” OR “patient” OR “EHR”) AND (“passage retrieval” OR “question answering OR “graph-based” OR “machine learning” OR “transformer based” OR “deep learning” OR Evidence-based Medicine(EBM)” OR “databases” OR “knowledge base” OR “knowledge sources” OR “metric” OR “measure”)) |
Pubmed | (1) Biomedical Text Summarization ((((((((biomedical text summarization) OR (clinical summary)) AND (biomedical document)) AND (medical documents)) AND (biomedical literature)) OR (automatic text summarization)) OR (clinical records)) OR (clinical notes)) AND (biomedical) (2) Semantic similarity approaches (((biomedical text similarity) OR (semantic similarity)) OR (biomedical domain)) AND (similarity measures) OR (semantic enrichment)) (3) Biomedical Question answering systems ((((biomedical QA)) OR (question answering)) OR (passage retrieval)) OR (biomedical domain) (4) similarity metrics ((((similarity measures) AND (text)) OR (document)) OR (NLP)) OR (text similarity metrics) OR (text similarity) OR (biomedical domain) |
Name | Description | Content-Type | Search |
---|---|---|---|
PubMed [21] | MED-LINE contains over 33 million citations and abstracts for biomedical literature. | Scientific | Informal keyword searching. Automatic mapping to Mesh terms. Narrow search results with better results. Search for clinical trials, efficient audits, and therapeutic hereditary quality themes. |
Pub-Med Central (PMC) [22] | The National Center for Biotechnology Information is in charge of it (NCBI) Full-text digital repository of biomedical and life sciences magazine articles (NCBI) | Scientific | Advanced Search Builder search by keywords, author, journal, etc. Combined search by Boolean operators. |
BioMed Central [47] | publishes about 300 peer-reviewed publications that communicate research findings from scientific, technological, engineering, and medical research teams and is part of Springer Nature | Scientific | Allows a number of searches that can only be done using templates. |
Ebase [48] | An Elsevier subscription Comparable content as PubMed/MEDLINE Extra consciousness on capsules and pharmacology, clinical devices, scientific medicine, and primary technological know-how applicable to scientific medicine. | Scientific | Quick search by title/abstract/author keywords. Combined search by Boolean operator “OR” keywords from EMBASE (=EMTREE). |
The Cochrane Library [25] | Group of databases in medicinal drugs and different healthcare specialties, well-conducted controlled trials | Clinical | Search by Title, Abstract, or Keywords. Keywords are called EMTREE terms, Mesh terms, and other keywords. |
CINAHL [26] | A database of nursing and allied health writing is called the Cumulative Index to Nursing and Allied Health Literature (CINAHL). 3604 active indexed and abstracted journals | Scientific | Search by title, abstract, and keywords. Effective search by using subject headings. Search allows several synonyms, divided by OR, and answers with double inverted commas. |
Allied and Complementary Medicine Database (AMED) [49] | produced with the help of the British Library’s Health Care Information Service. specialized bibliographic database created with doctors, therapists, scientists, and historians in mind. | Scientific | Simple search by keyword or phrase. Search with multiple words can use inverted commas around the phrase. By default search using keywords, author, and subject if the “Select a Field” option is not selected. Combine search by OR AND options. |
MedLine [23] | More than 28 million journal articles in the current sciences are cited in the NLM’s bibliographic data set, with a focus on biomedicine. It’s an interesting feature that the NLM is listed with the data in MEDLINE. | Scientific | Advanced Search provides a guided mapping of keywords to Mesh terms. Narrow search by using subheadings. A clinical search query for EBM clinical Reviews. |
ELSEVIER [24] | Specialized in scientific, technical, and medical content. | Scientific | Simple search by keyword, title, and subject area. Combine search by Boolean operators. |
PhysioNet [27] | An exploration asset for complex physiological signs. It gives free admittance to huge assortments of physiological and clinical information as well as related open-source programming. | Clinical EHR | Simple keyword search. Narrow the search by selecting relevance and resource type. |
PCORnet [28] | A public asset that gives a long-wanted sort of examination biological system: a completely coordinated network with tremendous, profoundly agent well-being information, research skill, and patient experiences working in and open. | Clinical | Simple search by keyword Narrow the search by selecting “category, resource type, network partners, and audience” |
Surveillance, Epidemiology, and End Results (SEER) [50] | Division of Cancer Control and Population Sciences of the National Cancer Institute’s Surveillance Research Program (SRP) and gives malignant growth insights with an end goal to decrease the disease trouble in the US populace (DCCPS) | Clinical | Simple keyword search. Combine search with Boolean operators. Keyword search for statistical information. |
BioPortal [51] | The most important tool provided by NCBO (National Center for Biomedical Ontology) is a Web gateway and Internet-based tools that encourage biomedical specialists to access, audit, and coordinate unique ontological resources in all areas of clinical practice and biomedical examination. | Scientific | Simple search by class name, ontology name. Advanced search by Property values, Obsolete classes, Ontology views. Limit your search using classes definition or exact matches. |
Model | Pretrained | Out of Vocabulary | Prediction | Frequency | Encode Morphological Information | Work Level | Evaluation |
---|---|---|---|---|---|---|---|
One hot coding [66] | - | - | - | - | - | words | Computationally expensive and sparse for a large corpus. Context independent. |
Cooccurrence matrix | - | - | - | √ | - | words | Faster but requires huge memory. |
Word2Vec [66] | √ | - | √ | - | - | words | It Consumes less space. Good for semantic relation. CBOW and Skip-grams variants. |
PV-DM [69] | √ | - | √ | - | - | Sentences, Paragraphs, and Documents | Softmax weights and word vectors call for extra memory |
PV-DBOW [69] | √ | - | √ | - | - | Sentences, Paragraphs, and Documents | Simple and faster. Less memory is needed because it only stores the word vectors. |
Glove [32] | √ | - | - | √ | - | words | Trained on the global co-occurrence matrix of all words combined. Denser and expressive vector representation. |
FastText [70] | √ | √ | √ | - | √ | Characters N-grams and words | Incorporates sub-word information. Memory and computationally intensive needs rise as the corpus size does. |
ELMo [70] | √ | √ | √ | - | √ | words | Context-dependent vector representations. Computationally intensive require more training time. |
Name | Content | Structure | Classes | Maximum Depth |
---|---|---|---|---|
SNOMED-CT [78] | Clinical terms | Collection of medical terms created by the College of American Pathologists. Medical terms provide necessary codes, synonyms, terms, and descriptions required in clinical reports. | 361,588 | 28 |
RCD [79] | Clinical Terms Version 3 (CTV3) (Read Codes). | Standard jargon for clinicians to record patient discoveries and strategies in well-being and social consideration. | 140,065 | 17 |
National Drug File Reference Terminology (NDRT) [80] | pharmacy | a formal representation used to depict the components of medicine as well as its chemical makeup, dosage form, physiological effects, mode of action, pharmaceutics, and associated disorders. | 36,202 | 11 |
International Classification of Diseases (ICD) [81] | Morbidity entities | Provides information about mortality and morbidity in population coded with ICD codes | 12,445 | 4 |
Medical Subject Headings (MESH) [82] | Medical Subject Headings | The indexing of life sciences books and journal articles serves a purpose. MeSH headings such as anatomy, diseases, chemical drugs, etc. | 347,692 | 15 |
MedDRA (Medical Dictionary for Regulatory Activities (MedDRA) [83] | International medical terminology | Use for information passage, recovery, investigation, and presentation are underlined. It applies to all phases of medication advancement, except for creature toxicology. | 75,741 | - |
Paper | Name | Proximity by | Assessment by | Description | Limitation | Range |
---|---|---|---|---|---|---|
[96] | Cosine | Distance | Length | Distance computed by cosine angle between two vectors. Used for continuous and categorical variables. | The magnitude and direction of vectors are not considered. Does not work efficiently with nominal data. | 0 to 1 |
[62,97] | Jaccard | Representation and numerical features | Phrase-based | It is calculated by dividing the size of the intersection by the size of the union of two sets. Used for continuous and categorical data. | Does not work efficiently with nominal data. Large datasets can have a big impact on the index. | 0% to 100% |
[98] | Word-movers | Distance | Semantics | A minimum distance of words in semantic space is computed using earth mover’s distance method. Word vectors and linear programming. | High computational cost OOV words | - |
[96] | Euclidean | Distance | Length | Euclidean space straight line separation between two points. | Not good with Higher dimensional data | - |
[99] | JS (Jensen–Shannon) Divergence | Distance | Distribution | Measure the similarity between two probability distributions. Used with LDA (latent Dirichlet allocation). | [0, 1] | |
[100] | KL (Kullback–Leibler) Divergence | Distance | Distribution | A comparison of two well-known discrete probability distributions | Triangular inequality is not satisfied and is not symmetrical. | (0, +∞) |
[101] | LCS (longest common substring) | Representation and numerical features | Character based | Measures the similarity between two strings | Less accurate | 0 to 1 |
[102] | Dice | Representation and numerical features | Phrase-based | Two sets of data are compared statistically by dividing the total number of elements in each set by the number of elements that are shared by both sets twice. | Does not satisfy the triangle inequality | 0 to 1 |
[66] | Word2vec | Representation and numerical features | Corpus-based Shallow window based | Word vectors distributed numerical representations of word features | Incapacity to deal with unfamiliar or OOV terms. The definition of sub-linear relationships is implicit. | −1 to 1 |
[32] | Glove | Representation and numerical features | Corpus-based shallow window-based method | Trained on the co-occurrence matrix of words. Limits the use of the word vectors to refer to sub-linear connections in vector space. | Inability to handle unknown or OOV words. A lot of memory for storage. | −1 to 1 |
[103] | BERT (Bidirectional Encoder Representations from Transformer) | Representation and numerical features | Corpus-based shallow window-based methods | Encodes a huge amount of information into a set of dense vectors. Vectors that are more inline are more semantically alike, and vice-versa. | computationally intensive at inference time. lack of ability to handle long text sequences. | −1 to 1 |
[104] | LSA (Latent Semantic Analysis) | Representation and numerical features | Corpus-based Matrix Factorization | Extracts the hidden themes that the text or document is trying to convey. Singular value decomposition (SVD). | SVD, which requires a lot of computing, is frequently used. lacks the ability to appropriately handle polysemy (words with many meanings). Not fit well for all types of problems | −1 to 1 |
[105] | LDA(Latent Dirichlet Allocation) | Representation and numerical features | Corpus-based Matrix Factorization | Probabilistic topic modeling. Better disambiguation of words. More precise assignment of documents to topics. | Additionally, there must be unrelated themes (the number of topics is predetermined and must be known in advance). | |
[106] | Bi-LSTM (Bidirectional Long-shortTerm Memory) | Representation and numerical features | Multisemantic document text matching | Have the sequence information in both directions. Usage of gates to regulate the flow of information. | Prone to overfittings. Expensive | - |
[94] | Knowledge Graph(KG) | Representation and numerical features | Graph structure | Create a consistent low-dimensional vector space from the knowledge graph’s elements and linkages (semantic portrayals that may effectively transmit semantic facts). | Coverage, correctness, and freshness of knowledge graphs | - |
[68,107] | IC-based measure | Information Content | Knowledge based | Use the Information Content values to compute semantic similarity between them. Lowest Common Subsume (LCS) which is extracted from the “is a” hierarchy. | Two pairs with the same summation of IC(c1) and IC(c2) will have the same similarity | |
[108] | Recall Oriented Understudy for Gisting Evaluation (ROUGE-N) | co-occurrence | determines the proportion of “n-grams” that match the model and reference texts. | ROUGE Recall ROUGE Precision ROUGE F1-Score | Cannot capture synonymous concepts and coverage of topics | 0 to 1 |
[108] | Recall-Oriented Understudy for Gisting Evaluation (ROUGE-SU) Skip Unigram | Co-occurrence | With maximum skip distances of 1, 4, or 9 | A candidate phrase is given credit even if it does not contain any word pairs that are co-occurring with its references. | Does not cater to different words that have the same meaning | 0 to 1 |
[108] | ROUGE-L | Representation | String based | Measures longest matching sequence of words using LCS | Does not require consecutive matches | 0 to 1 |
Paper | Supervised/Unsupervised Approach | Model | Semantically Aware Feature Extraction | Classification/Clustering/Ranking | Performance | Corpus |
---|---|---|---|---|---|---|
[109] | Supervised | Graph-based summarizer with named entity recognition(NER) | Maps the words in the linguistic index to the entities in the NER(Named Entity Recognition) index | Extended the LexRank graph-based algorithm with NER [99,110] Entity Rank with graph-based approach | ROUGE scores increased for unweighted, as well as the weighted, Entity Rank | Used Entrez Programming Data from PubMed scientific biomedical abstracts |
[111] | Supervised | Itemset based summarizer | Extracted concepts [95] by excluding concepts that are very generic. | Ranking of sentence by adding the support value of the item sets that cover the sentence. Itemset mining using the Apriori algorithm [112,113] | ROUGE metrics | 400 biomedical articles from BioMed Central’s corpus |
[114] | Supervised | Bayesian summarizer | all extracted concepts. use of concepts [95] by excluding generic semantic type frequency-based ranking of features. Helmholtz principle to compute meaningfulness [115] CF-IPF approach classification of features [112] | Naïve Bayes for classification | Bayesian summarizer approach | BioMed Central’s corpus |
[116] | Unsupervised | Graph-based biomedical text summarizer | Extracted concepts [95] by excluding concepts with aforementioned semantic types. Correlations among multiple concepts using frequent itemset. | Graph-based minimum spanning Tree clustering algorithm [117] | ROUGE scores | 400 biomedical articles from BioMed Central’s open-access corpus |
[118] | Unsupervised | graph-based summarizer | Extracted concepts [95] by excluding concepts that are very generic. Itemset mining. | Graph-based approach-small world network [119] | ROUGE-2 | Corpus contains 300 biomedical full-text articles from BioMed Central’s corpus. |
[13] | Supervised | Extractive query-based summarizer | Sentences and queries are vectorized using the tf-idf approach. | Regression and classification. Support Vector Machine | Classification performs better than regression | BioASQ data set |
[12] | Supervised | LSTM Model | classifier to label topics in the history of present illness (HPI) notes | LSTM Model | Precision (P), Recall (R), F1 Score = 0.88 | MIMIC-III |
[120] | Hybrid approach | Clustering and Item set mining-based summarizer (CIBS) | Itemset mining Apriori Algorithm [112,113] | Agglomerative hierarchical Clustering algorithm [121] | ROUGE scores For multi-document | Multi document corpus consists of 25 collections, each one containing 300 documents (Pubmed abstract) and a model summary. A single document corpus consists of 400 scientific biomedical articles from BioMed Central’s corpus. |
[6] | Supervised | Small world network based summarizer | Helmholtz principle [115] to calculate the meaningfulness of the concept | Graph based approach to a small-world network | Rouge | 300 biomedical articles from BioMed Central’s corpus. |
[122] | Unsupervised | Clustering and itemset based summarizer | Concept frequency sentence frequency (CF-SF) Vector space model [123] extracted concepts [95] by excluding generic semantic type | itemset mining using the apriori algorithm [96]. K means clustering [124] | Rouge | 100 biomedical full-text papers from the BioMed Central. |
[87] | Unsupervised | pre-trained deep language model BERT | Pre-trained BERT on Wikipedia and BookCorpus | Agglomerative hierarchical clustering algorithm [121] | R1 = 0.7639 R2 = 0.3481 | Articles from BioMed Central. |
[125] | Supervised | MINTS (Multi Indicator Text Summarization Algorithm). | Feature matrix using 5 pointers of significance such as: length of the sentence position, term relevance rate standardized degree centrality, cross-over with global term frequency distribution determined using the Srensen Dicecoefficient/list (DS) as a comparability metric [126] | Apache Lucene [127] Random forests classifier [126] Aggregated ranking of indicators of relevance | ROUGE-1: 0.414 ROUGE-2:0.136 ROUGE-SU4:0.171 | Articles from the Colorado Richly Annotated Full Text (CRAFT) corpus [128]. Indexed database of Medline abstracts |
[58] | Supervised | Syntax based Negation and Semantic Concept Identification based summarizer | Concept recognition using cTAKES [129] | cTAKES clinical NER [129] using regular expression | Negation Detection Accuracy Concept identification | clinical narrative texts from MIMIC-III critical care database [20] contains 58,976 ICU patients. |
[7] | Unsupervised | Domain specific word embeddings and graph based summarizer | 3 different versions of BioBERT pretrained on PubMed abstracts, PubMed Central (PMC) full-text articles, and a combination of both respectively | Page rank algorithm | ROUGE-1 ROUGE-2 | Created corpus by retrieving 2000 articles from PubMed Central. |
[130] | Unsupervised | Word Embedding Based BiomedicaText Summarizer | Word2vec Pretrained from PubMed, PMC, and recent English Wikipedia dump texts | Graph based Page Rank algorithm | ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-SU4 | Corpus of 200 biomedical papers from BioMed central full-text database |
[65] | Supervised | Biomed Summarizer | Kera tokenizer [131] The prognosis quality recognition model (AdaBoost MLP) was trained on 5 features title, abstract, article type, publishing journal, and authors. Semantic enrichment using ontologies | Bi-LSTM PICO classifier with two more classes, Aim, and Results. Aggregate score of relevance study type venue credibility freshness | Accuracy of identification of quality articles:95.41% Accuracy of classification:93% | PubMed abstracts |
[132] | Supervised | Word Embedding based Maximal Marginal Relevance [34] | Pre-trained word2vec and skip-gram tools from PubMed and PubMed Central (PMC). 5 features from the QSpec system [36] | Maximum Marginal Relevance [34] | F1-score | Clinical Inquiries section of The Journal of Family Practice |
[133] | Unsupervised | MultiGBS | MetaMap [134], OGER [135], and SemRep [136] to extract 3 types of relationships semantic, word and co-reference | Multi-layer graph approach with MultiGBS sentence selection algorithm [137] | F-measure ROUGE-L | 450 biomedical scientific articles from BioMed Central |
[18] | Supervised | Attention based clinical note summarizer | Fine-tuned BERT model used for word embeddings | High Attention Score of sentences calculated by correlating tokens, segments, and positional embeddings | KLD = 0.795 JSD = 0.405 | ICD-9 labeled MIMIC-III discharge notes |
[84] | Supervised | SciBERT based Summarizer | Pretrained SciBERT [88] | Graph Attention Networks-based graph encoder to encode sentences and word co-occurrence graphs, | kappa/alpha informative = 0.669/0.671 coherence = 0.602/0.605 redundancy = 0.653/0.656 fluency = 0.689/0.692 | COVID-19 open research corpus build from PMC, PubMed, WHO Database |
Paper | Dataset Content Type | Question Proc. | Document Proc. | Passage Retrieval | Answer Proc. | Answer Type | Databases |
---|---|---|---|---|---|---|---|
[138] | Clinical | Question analysis with MetaMap Transfer [134] (MMtx) and UMLS [71]. Question classification on the basis of weighted phrase annotation | Use of machine learning classifiers for document classification. Use of cosine similarity for searching the relevant documents | Passage retrieval using similarity vectors | Topic clustering, ranking, and hierarchical answer representation | Passage | 1700 abstracts related to pancreatic cancer from PUBMED |
[139] | Scientific | Question classification and query modification using NER and SRL [140]. | WordNet [141] and Longman’s [72] dictionary used with Google interfacing program | NER(Named Entity Recognition) and SRL, Ranking | Linear Answer Ranking Model | Passage | |
[142] | Scientific | Deep syntactic representation of the questions using Government and Binding parser, FIPS [85] | Document retrieval through PubMed | Rank descriptor belonging to the target set | Rank descriptor belonging to the target set | Candidate answers | 5000 MEDLINE abstracts |
[143] | Clinical | - | probabilistic relevance model BM25 [144] | Longest Common Subsequence [101] | Topical clustering and ranking | Multiple sentences passages | MEDLINE abstracts, eMedicine documents, clinical guidelines full-text articles, and Wikipedia documents |
[145] | Clinical | Question processing with cTAKES clinical text analysis system [129] | Document retrieval and ranking of full text using Lucene indexing [127] | The paragraph level baseline using document level score and paragraph level scores | Rule-based reranking and ML-based reranking | Paragraphs | Medpedia [146] and Cliniques corpus [147] |
[148] | Clinical | PICO-based question templates | Lucene indexer [127] for relevant document retrival | top N-matched clinical evidence will be considered as the candidate answers | Probability based score | Paragraphs | Trip Answers website [149] |
[150] | Clinical | Customization of question templates in [151] | Web search engines, Google and PubMed | Description Logic(DLF) and UMLS Semantic Network | DLF pattern matching in question and answer | Answer patterns are semantic triples in the form of subject-predicate-object | Google and PubMed |
[152] | medical | MESA ontology-based extraction of medical entities, semantic relations, and additional information about the patient | RDF annotations of the source documents and SPARQL queries | RDF annotations with SPARQL queries | Three steps query relaxation, semantic search, and ranking | Factoid, definition | MEDLINE articles |
[153] | clinical | Semantically annotating the questions with Mesh [154] | Retrieval of relevant documents databases and knowledge bases using PubMed curators [21] | Annotate passages with Ontological concepts | a ranked list of candidate answers | Factoid or collection of text snippets | Pubmed articles |
[35] | clinical | Metamap [134] tool for query construction | PubMed search engine and UMLS similarity for question concept [21,71]. Document reranking using MetaMap [134]. | Stanford CoreNLP [155] to retrieve relevant passage | BM25 [144] to rank passages | passage | Pubmed documents |
[156] | Clinical Diabetics | Regular expression matching for question answer pair extraction | - | - | Latent semantic Indexing based on similarity calculation and answer ranking [104] | Candidate passage | Historical health data, LMD-FAQ Repository, web of the knowledge base |
[157] | clinical/ Examination | Text sequences as an input to the SeaReader model | Apache Lucene [127] followed by BM25 ranking [144] | - | Attention score used to rank answers | passage | National Medical Licensing Examination |
[158] | clinical | Annotating questions using Wordnet, SNOMED ontology | - | - | Question-answer template matching using Semantic Acquisition and text implication algorithms [159] | - | 500 user questions collected from the medical field |
[160] | scientific | Term-based interaction model | Document retrival using BM25 [144] and reranking using one of the model PACER [161], ABEL-DRMM [162] | BCNN [163] is used to score snippets | Relevance score of the document used to select the top K snippets as the answer passage | passage | Articles from MEDLINE /PubMed Baseline 2018 collection |
[86] | scientific | question and answer sentence encoding using BiLSTM | SDM sequential dependence model based on the Markov random field model [164] | - | Semantic Matching Model | - | PubMed abstracts |
[64] | Clinical | handcrafted lexico syntactic patterns and a machine learning algorithm for question classification of questions [165] | Pubmed search engine and UMLS [71] similarity | Stanford Core NLP and BM25 [144] | Different approaches for different types such as UMLS, and BM25 [71,144] | Yes or no, factoid, list, and summary | Medline Database |
[166] | clinical | LSTM and DNN-based query selection to obtain keyword query | Iterative Elastic search | - | weighted Relaxed Word Mover’s Distance [98] and Supervised answer candidate reranking using BERT | Passage as well as a factoid | Corpus of abstracts extracted from the PMC |
[167] | scientific | Query formulation using NLTK | search engine to retrieve relevant documents | Generates semantic vectors of Question Snippets pairs. | probabilities of Q-A relations and ranking using RNN | snippets | Biomedical literature from PubMed/MedLine |
[168] | Consumer health | Question processing with SVM, rule-based method, question frame extraction | More weightage to question focus and type in a query to get a relevant document | IR-based and entailment based answer retrieval using BM25 [144] and Feature based classifier respectively. | conventional team–draft interleaving to score answer sentences | paragraph | LiveQA-Med 2017 and Alexa MedlinePlus collections |
[169] | Scientific | - | Elastic search (ES) used with the BM25 [144] to get relevant documents | Neural Ranking Model Deep Rank [170] | aggregation network for ranking | passage | PubMed Articles |
[171] | Scientific | - | TF-IDF vectorizer and cosine similarity | Pre trained Distil BERT [89] model | The top 3 answers retrieved based on a weighted score between the retriever score and reader score | passage | CORD-19: Open Research Data set [172] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pawar, D.; Phansalkar, S.; Sharma, A.; Sahu, G.K.; Ang, C.K.; Lim, W.H. Survey on the Biomedical Text Summarization Techniques with an Emphasis on Databases, Techniques, Semantic Approaches, Classification Techniques, and Similarity Measures. Sustainability 2023, 15, 4216. https://doi.org/10.3390/su15054216
Pawar D, Phansalkar S, Sharma A, Sahu GK, Ang CK, Lim WH. Survey on the Biomedical Text Summarization Techniques with an Emphasis on Databases, Techniques, Semantic Approaches, Classification Techniques, and Similarity Measures. Sustainability. 2023; 15(5):4216. https://doi.org/10.3390/su15054216
Chicago/Turabian StylePawar, Dipti, Shraddha Phansalkar, Abhishek Sharma, Gouri Kumar Sahu, Chun Kit Ang, and Wei Hong Lim. 2023. "Survey on the Biomedical Text Summarization Techniques with an Emphasis on Databases, Techniques, Semantic Approaches, Classification Techniques, and Similarity Measures" Sustainability 15, no. 5: 4216. https://doi.org/10.3390/su15054216
APA StylePawar, D., Phansalkar, S., Sharma, A., Sahu, G. K., Ang, C. K., & Lim, W. H. (2023). Survey on the Biomedical Text Summarization Techniques with an Emphasis on Databases, Techniques, Semantic Approaches, Classification Techniques, and Similarity Measures. Sustainability, 15(5), 4216. https://doi.org/10.3390/su15054216