Improving Consumer Health Search with Field-Level Learning-to-Rank Techniques
Abstract
:1. Introduction
2. Literature Review
2.1. Understandability in Consumer Health Search
2.2. Learning-to-Rank Techniques
3. Methodology
3.1. Hypothesis
3.2. F-LTR Approach
3.2.1. Stage One: Building f-LTR Models
3.2.2. Stage Two: Fusion of f-LTR Models
3.3. Rationale for the Methodology
4. Experiments
4.1. Datasets
4.2. Features Extracted
4.3. Developed Rankers
4.4. Evaluation Metrics
5. Results
5.1. Comparing f-LTR Model to the Standard LTR Model
5.2. Fused f-LTR Rankers
5.3. Comparing to the State-of-the-Art Techniques
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
CHS | Consumer health search |
CHIR | Consumer health information retrieval |
LTR | Learning to Rank |
f-LTR | Field-level Learning to Rank |
AP | Average Precision |
IDCG | Ideal Discounted Cumulative Gain |
IR | Information retrieval |
MAP | Mean Average Precision |
NDCG | Normalized Discounted Cumulative Gain |
NIH | National Institute of Health |
AMA | American Medical Association |
SMOG | Simple Measure of Gobbledygook |
References
- Pugachev, A.; Artemova, E.; Bondarenko, A.; Braslavski, P. Consumer health question answering using off-the-shelf components. In Proceedings of the European Conference on Information Retrieval, Dublin, Ireland, 2–6 April 2023; Springer: New York, NY, USA, 2023; pp. 571–579. [Google Scholar]
- Upadhyay, R.; Pasi, G.; Viviani, M. A passage retrieval transformer-based re-ranking model for truthful consumer health search. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Turin, Italy, 18–22 September 2023; Springer: New York, NY, USA, 2023; pp. 355–371. [Google Scholar]
- Upadhyay, R.; Knoth, P.; Pasi, G.; Viviani, M. Explainable online health information truthfulness in Consumer Health Search. Front. Artif. Intell. 2023, 6, 1184851. [Google Scholar] [CrossRef] [PubMed]
- Goeuriot, L.; Suominen, H.; Kelly, L.; Alemany, L.A.; Brew-Sam, N.; Cotik, V.; Filippo, D.; Gonzalez Saez, G.; Luque, F.; Mulhem, P.; et al. CLEF eHealth evaluation lab 2021. In Proceedings of the Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, 28 March–1 April 2021; Proceedings, Part II 43. Springer: New York, NY, USA, 2021; pp. 593–600. [Google Scholar]
- Zehlike, M.; Castillo, C. Reducing disparate exposure in ranking: A learning to rank approach. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 2849–2855. [Google Scholar]
- Bhatt, C.; Lin, E.; Ferreira-Legere, L.E.; Jackevicius, C.A.; Ko, D.T.; Lee, D.S.; Schade, K.; Johnston, S.; Anderson, T.J.; Udell, J.A. Evaluating readability, understandability, and actionability of online printable patient education materials for cholesterol management: A systematic review. J. Am. Heart Assoc. 2024, 13, e030140. [Google Scholar] [CrossRef] [PubMed]
- Rooney, M.K.; Santiago, G.; Perni, S.; Horowitz, D.P.; McCall, A.R.; Einstein, A.J.; Jagsi, R.; Golden, D.W. Readability of patient education materials from high-impact medical journals: A 20-year analysis. J. Patient Exp. 2021, 8, 2374373521998847. [Google Scholar] [CrossRef] [PubMed]
- Deidra Bunn, S.; Erickson, K. Voices from Academia Minimizing the Complexity of Public Health Documents: Making COVID-19 Documents Accessible to Individuals Who Read Below the Third-Grade Level. In Assistive Technology Outcomes and Benefits Accessible Public Health Materials During a Pandemic: Lessons Learned from COVID-19; Assistive Technology Outcomes & Benefits (ATOB): Schaumburg, IL, USA, 2022. [Google Scholar]
- Kher, A.; Johnson, S.; Griffith, R. Readability assessment of online patient education material on congestive heart failure. Adv. Prev. Med. 2017, 2017, 9780317. [Google Scholar] [CrossRef]
- Hollada, J.L.; Zide, M.; Speier, W.; Roter, D.L. Readability Assessment of Patient-Centered Outcomes Research Institute Public Abstracts in Relation to Accessibility. Epidemiology 2017, 28, e37–e38. [Google Scholar] [CrossRef]
- Antunes, H.; Lopes, C.T. Proposal and comparison of health specific features for the automatic assessment of readability. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an, China, 25–30 July 2020; pp. 1973–1976. [Google Scholar]
- Gordejeva, J.; Zowalla, R.; Pobiruchin, M.; Wiesner, M. Readability of English, German, and Russian Disease-Related Wikipedia Pages: Automated Computational Analysis. J. Med. Internet Res. 2022, 24, e36835. [Google Scholar] [CrossRef]
- Liu, T.Y. Learning to rank for information retrieval. Found. Trends® Inf. Retr. 2009, 3, 225–331. [Google Scholar] [CrossRef]
- Burges, C.J. From ranknet to lambdarank to lambdamart: An overview. Learning 2010, 11, 81. [Google Scholar]
- Joachims, T. Optimizing search engines using clickthrough data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 23–26 July 2002; pp. 133–142. [Google Scholar]
- Miyachi, Y.; Ishii, O.; Torigoe, K. Design, implementation, and evaluation of the computer-aided clinical decision support system based on learning-to-rank: Collaboration between physicians and machine learning in the differential diagnosis process. BMC Med. Inform. Decis. Mak. 2023, 23, 26. [Google Scholar] [CrossRef]
- Javaid, M.; Haleem, A.; Singh, R.P.; Suman, R.; Rab, S. Significance of machine learning in healthcare: Features, pillars and applications. Int. J. Intell. Netw. 2022, 3, 58–73. [Google Scholar] [CrossRef]
- Habehh, H.; Gohel, S. Machine learning in healthcare. Curr. Genom. 2021, 22, 291. [Google Scholar] [CrossRef] [PubMed]
- Geng, X.; Liu, T.Y.; Qin, T.; Li, H. Feature selection for ranking. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, 23–27 July 2007; pp. 407–414. [Google Scholar]
- Xu, J.; Li, H. Adarank: A boosting algorithm for information retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, 23–27 July 2007; ACM: New York, NY, USA, 2007; pp. 391–398. [Google Scholar]
- Douze, L.; Pelayo, S.; Messaadi, N.; Grosjean, J.; Kerdelhué, G.; Marcilly, R. Designing Formulae for Ranking Search Results: Mixed Methods Evaluation Study. JMIR Hum. Factors 2022, 9, e30258. [Google Scholar] [CrossRef] [PubMed]
- Azarbonyad, H.; Dehghani, M.; Marx, M.; Kamps, J. Learning to rank for multi-label text classification: Combining different sources of information. Nat. Lang. Eng. 2021, 27, 89–111. [Google Scholar] [CrossRef]
- Ueda, A.; Santos, R.L.; Macdonald, C.; Ounis, I. Structured Fine-Tuning of Contextual Embeddings for Effective Biomedical Retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 2031–2035. [Google Scholar]
- Fox, E.A.; Shaw, J.A. Combination of multiple searches. In Proceedings of the 2nd Text REtrieval Conference (TREC-2), Gaithersburg, MD, USA, 31 August–2 September 1993; NIST Special Publication 500-215. pp. 243–252. [Google Scholar]
- Vogt, C.C.; Cottrell, G.W. Fusion via a linear combination of scores. Inf. Retr. 1999, 1, 151–173. [Google Scholar] [CrossRef]
- Manmatha, R.; Rath, T.; Feng, F. Modeling score distributions for combining the outputs of search engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, USA, 9–13 September 2001; ACM: New York, NY, USA, 2001; pp. 267–275. [Google Scholar]
- Kuzi, S.; Shtok, A.; Kurland, O. Query expansion using word embeddings. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, IN, USA, 24–28 October 2016; ACM: New York, NY, USA, 2016; pp. 1929–1932. [Google Scholar]
- Xia, X.; Lo, D.; Wang, X.; Zhang, C.; Wang, X. Cross-language bug localization. In Proceedings of the 22nd International Conference on Program Comprehension, Hyderabad, India, 2–3 June 2014; ACM: New York, NY, USA, 2014; pp. 275–278. [Google Scholar]
- Ru, X.; Ye, X.; Sakurai, T.; Zou, Q. Application of learning to rank in bioinformatics tasks. Briefings Bioinform. 2021, 22, bbaa394. [Google Scholar] [CrossRef]
- Bhagawati, R.; Subramanian, T. An approach of a quantum-inspired document ranking algorithm by using feature selection methodology. Int. J. Inf. Technol. 2023, 15, 4041–4053. [Google Scholar] [CrossRef]
- Zhao, Y.; Da, J.; Yan, J. Detecting health misinformation in online health communities: Incorporating behavioral features into machine learning based approaches. Inf. Process. Manag. 2021, 58, 102390. [Google Scholar] [CrossRef]
- Oyebode, O.; Fowles, J.; Steeves, D.; Orji, R. Machine learning techniques in adaptive and personalized systems for health and wellness. Int. J. Hum. Comput. Interact. 2023, 39, 1938–1962. [Google Scholar] [CrossRef]
- Henrich, A.; Wegmann, M. Search and evaluation methods for class level information retrieval: Extended use and evaluation of methods applied in expertise retrieval. In Proceedings of the 36th Annual ACM Symposium on Applied Computing, Gwangju, Republic of Korea, 22–26 March 2021; pp. 681–684. [Google Scholar]
- Macdonald, C.; Tonellotto, N.; MacAvaney, S.; Ounis, I. PyTerrier: Declarative experimentation in Python from BM25 to dense retrieval. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Gold Coast, Australia, 1–5 November 2021; pp. 4526–4533. [Google Scholar]
- Aloteibi, S. A User-Centred Approach to Information Retrieval. Ph.D. Thesis, University of Cambridge, Cambridge, UK, 2021. [Google Scholar]
- Santos, P.M.; Teixeira Lopes, C. Generating query suggestions for cross-language and cross-terminology health information retrieval. In Proceedings of the Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, 14–17 April 2020; Proceedings, Part II 42. Springer: New York, NY, USA, 2020; pp. 344–351. [Google Scholar]
- Bałchanowski, M.; Boryczka, U. How Normalization Strategies Affect the Quality of Rank Aggregation Methods in Recommendation Systems. Procedia Comput. Sci. 2023, 225, 1843–1852. [Google Scholar] [CrossRef]
- Azad, H.K.; Deepak, A. Query expansion techniques for information retrieval: A survey. Inf. Process. Manag. 2019, 56, 1698–1735. [Google Scholar] [CrossRef]
- Jimmy; Zuccon, G.; Palotti, J. Overview of the CLEF 2018 Consumer Health Search Task. In Proceedings of the Working Notes of Conference and Labs of the Evaluation (CLEF) Forum, CEUR Workshop Proceedings, Avignon, France, 10–14 September 2018. [Google Scholar]
- Nentidis, A.; Katsimpras, G.; Krithara, A.; Paliouras, G. Overview of BioASQ tasks 12b and Synergy12 in CLEF2024. Work. Notes CLEF 2024, 2024. Available online: https://ceur-ws.org/Vol-3740/paper-01.pdf (accessed on 28 October 2024).
- Şerbetçi, O.; Wang, X.D.; Leser, U. HU-WBI at BioASQ12B Phase A: Exploring Rank Fusion of Dense Retrievers and Re-rankers. In Proceedings of the Conference and Labs of the Evaluation Forum, Grenoble, France, 9–12 September 2024. [Google Scholar]
Field | Description |
---|---|
H1–H6 | Section headings at different levels; H1 is the highest-level heading and H6 is the lowest level. |
Title | A document title. |
Header | Defines a header for a document or section. |
Meta | Metadata of a document such as author, publication date, keywords, etc. |
Anchor | Anchors a URL to some text on a web page. |
Body | Body content of a document. |
Else | Not defined in any field. |
Whole | The contents of the full document. |
CLEF’ 2016–2017 Collection | CLEF’ 2018 Collection | |
---|---|---|
Dataset | ClueWeb12-B13 | 5,535,120 Web pages |
Query set | 300 | 50 |
Qrels files | 269,232 | 18,763 |
Feature | Field Group | |||
---|---|---|---|---|
Title (T) | H1 (H) | Else (E) | Full Doc (F) | |
TFIDF | T-TFIDF | H-TFIDF | E-TFIDF | F-TFIDF |
TF | T-TF | H-TF | E-TF | F-TF |
IDF | T-IDF | H-IDF | E-IDF | F-IDF |
BM25 | T-BM25 | H-BM25 | E-BM25 | F-BM25 |
HiemstraLM | T-HiemstraLM | H-HiemstraLM | E-HiemstraLM | F-HiemstraLM |
DirichletLM | T-DirichletLM | H-DirichletLM | E-DirichletLM | F-DirichletLM |
BB2 | T-BB2 | H-BB2 | E-BB2 | F-BB2 |
PL2 | T-PL2 | H-PL2 | E-PL2 | F-PL2 |
Dl | T-D1 | H-D1 | E-D1 | f-D1f |
Ranker | Method Description |
---|---|
RT | f-LTR model with FTitle features |
RH | f-LTR model with FH1 features |
RE | f-LTR model with FElse features |
RF | f-LTR model with Ffull features |
RA | f-LTR model with 36 features |
RTH | Fusion of RT and RH |
RTHE | Fusion of RT, RH, and RE |
RTHEF | Fusion of RT, RH, RE, and RF |
Algorithm | Ranker | Understandability | Topicality | |||
---|---|---|---|---|---|---|
uRBP | uRBPgr | P@10 | NDCG@10 | MAP | ||
f-LTR | RT | 0.7131 | 0.3020 | 0.6820 | 0.6131 | 0.2428 |
RH | 0.6493 | 0.2630 | 0.6340 | 0.5683 | 0.2279 | |
RE | 0.5849 | 0.2380 | 0.5700 | 0.4753 | 0.2115 | |
RF | 0.6539 | 0.2750 | 0.6620 | 0.5395 | 0.2404 | |
Standard LTR | RA | 0.5821 | 0.2550 | 0.6420 | 0.5687 | 0.2177 |
Algorithm | Ranker | Understandability | Topicality | |||
---|---|---|---|---|---|---|
uRBP | uRBPgr | P@10 | NDCG@10 | MAP | ||
Fused f-LTR | RTH_med | 0.7055 | 0.3000 | 0.6740 | 0.6101 | 0.2492 |
RTHE_med | 0.7220 | 0.3100 | 0.7040 | 0.6246 | 0.2500 | |
RTHEF_med | 0.7658 | 0.3280 | 0.7440 | 0.6630 | 0.2660 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, H.; Gonçalves, T. Improving Consumer Health Search with Field-Level Learning-to-Rank Techniques. Information 2024, 15, 695. https://doi.org/10.3390/info15110695
Yang H, Gonçalves T. Improving Consumer Health Search with Field-Level Learning-to-Rank Techniques. Information. 2024; 15(11):695. https://doi.org/10.3390/info15110695
Chicago/Turabian StyleYang, Hua, and Teresa Gonçalves. 2024. "Improving Consumer Health Search with Field-Level Learning-to-Rank Techniques" Information 15, no. 11: 695. https://doi.org/10.3390/info15110695
APA StyleYang, H., & Gonçalves, T. (2024). Improving Consumer Health Search with Field-Level Learning-to-Rank Techniques. Information, 15(11), 695. https://doi.org/10.3390/info15110695