Health Data Information Retrieval

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Systems".

Deadline for manuscript submissions: closed (31 May 2024) | Viewed by 41773

Special Issue Editors


E-Mail Website
Guest Editor
Institute for High Performance Computing and Networking, National Research Council of Italy, 7-00185 Roma, Italy
Interests: e-health; security; electronic health records; interoperability

E-Mail Website
Guest Editor
Institute for High Performance Computing and Networking, National Research Council of Italy, 7-00185 Roma, Italy
Interests: software architectures; interoperability; information systems; health informatics

Special Issue Information

Dear Colleagues,

The MDPI Information Journal invites submissions to a Special Issue on “Health Data Information Retrieval”.

The increasing proliferation of digital health data coming from a variety of sources, including electronic health records, laboratory results, and personal device data, is bringing new opportunities for improving healthcare and research. Such data are characterized from a wide heterogeneity in terms of content, format, and clinical domain and representation, ranging from structured to unstructured forms (such as text, images, signals, and so on). This raises a number of challenges relating to effective storage, access, and searching of information for several purposes in the domain of healthcare.

Moreover, recent advances in machine learning, natural language processing, and big data analytics, along with the spread of health informatics standards, are able to facilitate systematic data collection and discovery from information retrieval systems.

Authors are invited to contribute by submitting original papers describing research findings, innovative solutions, and lessons learned in health data information retrieval. The aim of this Special Issues is to provide both a showcase to present the main research results in the area and an engine to introduce and explore new concepts.

Relevant topics of interest include but are not limited to the following:

  • Information extraction in healthcare;
  • Data processing in healthcare;
  • Distributed and large-scale health data access and management;
  • Information and networking security in healthcare;
  • Infrastructure and information services in healthcare;
  • Big data analytics in healthcare;
  • Search engines, big data search, and mining in healthcare;
  • Question answering for health information retrieval;
  • Query expansion, content classification, and clustering in healthcare;
  • NLP, machine learning, and computational linguistics for health information retrieval;
  • Web intelligence applications and search in healthcare.

Dr. Mario Ciampi
Dr. Mario Sicuranza
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Health information systems
  • Health data representation
  • Natural language processing
  • Big data analytics
  • Machine learning
  • Semantic search
  • Interoperability of networked information

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

18 pages, 4559 KiB  
Article
Trends of Social Anxiety in University Students of Pakistan Post-COVID-19 Lockdown: A Healthcare Analytics Perspective
by Ikram E. Khuda, Azeem Aftab, Sajid Hasan, Samar Ikram, Sadique Ahmad, Abdelhamied Ashraf Ateya and Muhammad Asim
Information 2024, 15(7), 373; https://doi.org/10.3390/info15070373 - 28 Jun 2024
Viewed by 1630
Abstract
This paper disseminates our research findings that we conducted on university students in the years 2021, 2022, and 2023, with the year 2021 taken as the base year. Our research mined and excavated a concealed prevalence of social anxiety as an important and [...] Read more.
This paper disseminates our research findings that we conducted on university students in the years 2021, 2022, and 2023, with the year 2021 taken as the base year. Our research mined and excavated a concealed prevalence of social anxiety as an important and crucial facet of study anxiety in the university students of Pakistan. Using the Liebowitz Social Anxiety Scale (LSAS), we found a significant increase in the social anxiety level among university students in the past three years after the COVID-19 lockdown. Our data showed that the ‘very severe anxiety’ level soared up to 52.94% in the year 2023 from just 5.98% in the year 2021, showing a net increase of 47.06%. Statistical analyses demonstrate noteworthy differences in the overall social anxiety levels among the students, reaching significance at the 5% level and a discernable upward trend in the social anxiety levels as study anxiety. We also employed predictive analytics, including binary classifiers and generalized linear models with a 95% confidence interval, to identify individuals at risk. This study highlights a dynamic shift with escalating social anxiety levels among the university students and thus emphasizing its awareness, which is significantly important for the timely intervention, potentially preventing symptom escalation and improving outcomes. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

21 pages, 10748 KiB  
Article
Modeling COVID-19 Transmission in Closed Indoor Settings: An Agent-Based Approach with Comprehensive Sensitivity Analysis
by Amir Hossein Ebrahimi, Ali Asghar Alesheikh, Navid Hooshangi, Mohammad Sharif and Abolfazl Mollalo
Information 2024, 15(6), 362; https://doi.org/10.3390/info15060362 - 19 Jun 2024
Viewed by 1181
Abstract
Computational simulation models have been widely used to study the dynamics of COVID-19. Among those, bottom-up approaches such as agent-based models (ABMs) can account for population heterogeneity. While many studies have addressed COVID-19 spread at various scales, insufficient studies have investigated the spread [...] Read more.
Computational simulation models have been widely used to study the dynamics of COVID-19. Among those, bottom-up approaches such as agent-based models (ABMs) can account for population heterogeneity. While many studies have addressed COVID-19 spread at various scales, insufficient studies have investigated the spread of COVID-19 within closed indoor settings. This study aims to develop an ABM to simulate the spread of COVID-19 in a closed indoor setting using three transmission sub-models. Moreover, a comprehensive sensitivity analysis encompassing 4374 scenarios is performed. The model is calibrated using data from Calabria, Italy. The results indicated a decent consistency between the observed and predicted number of infected people (MAPE = 27.94%, RMSE = 0.87 and χ2(1,N=34)=(44.11,p=0.11)). Notably, the transmission distance was identified as the most influential parameter in this model. In nearly all scenarios, this parameter had a significant impact on the outbreak dynamics (total cases and epidemic peak). Also, the calibration process showed that the movement of agents and the number of initial asymptomatic agents are vital model parameters to simulate COVID-19 spread accurately. The developed model may provide useful insights to investigate different scenarios and dynamics of other similar infectious diseases in closed indoor settings. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

16 pages, 1836 KiB  
Article
Telehealth-Based Information Retrieval and Extraction for Analysis of Clinical Characteristics and Symptom Patterns in Mild COVID-19 Patients
by Edison Jahaj, Parisis Gallos, Melina Tziomaka, Athanasios Kallipolitis, Apostolos Pasias, Christos Panagopoulos, Andreas Menychtas, Ioanna Dimopoulou, Anastasia Kotanidou, Ilias Maglogiannis and Alice Georgia Vassiliou
Information 2024, 15(5), 286; https://doi.org/10.3390/info15050286 - 17 May 2024
Viewed by 1064
Abstract
Clinical characteristics of COVID-19 patients have been mostly described in hospitalised patients, yet most are managed in an outpatient setting. The COVID-19 pandemic transformed healthcare delivery models and accelerated the implementation and adoption of telemedicine solutions. We employed a modular remote monitoring system [...] Read more.
Clinical characteristics of COVID-19 patients have been mostly described in hospitalised patients, yet most are managed in an outpatient setting. The COVID-19 pandemic transformed healthcare delivery models and accelerated the implementation and adoption of telemedicine solutions. We employed a modular remote monitoring system with multi-modal data collection, aggregation, and analytics features to monitor mild COVID-19 patients and report their characteristics and symptoms. At enrolment, the patients were equipped with wearables, which were associated with their accounts, provided the respective in-system consents, and, in parallel, reported the demographics and patient characteristics. The patients monitored their vitals and symptoms daily during a 14-day monitoring period. Vital signs were entered either manually or automatically through wearables. We enrolled 162 patients from February to May 2022. The median age was 51 (42–60) years; 44% were male, 22% had at least one comorbidity, and 73.5% were fully vaccinated. The vitals of the patients were within normal range throughout the monitoring period. Thirteen patients were asymptomatic, while the rest had at least one symptom for a median of 11 (7–16) days. Fatigue was the most common symptom, followed by fever and cough. Loss of taste and smell was the longest-lasting symptom. Age positively correlated with the duration of fatigue, anorexia, and low-grade fever. Comorbidities, the number of administered doses, the days since the last dose, and the days since the positive test did not seem to affect the number of sick days or symptomatology. The i-COVID platform allowed us to provide remote monitoring and reporting of COVID-19 outpatients. We were able to report their clinical characteristics while simultaneously helping reduce the spread of the virus through hospitals by minimising hospital visits. The monitoring platform also offered advanced knowledge extraction and analytic capabilities to detect health condition deterioration and automatically trigger personalised support workflows. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

27 pages, 25304 KiB  
Article
Multiple Explainable Approaches to Predict the Risk of Stroke Using Artificial Intelligence
by Susmita S, Krishnaraj Chadaga, Niranjana Sampathila, Srikanth Prabhu, Rajagopala Chadaga and Swathi Katta S
Information 2023, 14(8), 435; https://doi.org/10.3390/info14080435 - 1 Aug 2023
Cited by 7 | Viewed by 4410
Abstract
Stroke occurs when a brain’s blood artery ruptures or the brain’s blood supply is interrupted. Due to rupture or obstruction, the brain’s tissues cannot receive enough blood and oxygen. Stroke is a common cause of mortality among older people. Hence, loss of life [...] Read more.
Stroke occurs when a brain’s blood artery ruptures or the brain’s blood supply is interrupted. Due to rupture or obstruction, the brain’s tissues cannot receive enough blood and oxygen. Stroke is a common cause of mortality among older people. Hence, loss of life and severe brain damage can be avoided if stroke is recognized and diagnosed early. Healthcare professionals can discover solutions more quickly and accurately using artificial intelligence (AI) and machine learning (ML). As a result, we have shown how to predict stroke in patients using heterogeneous classifiers and explainable artificial intelligence (XAI). The multistack of ML models surpassed all other classifiers, with accuracy, recall, and precision of 96%, 96%, and 96%, respectively. Explainable artificial intelligence is a collection of frameworks and tools that aid in understanding and interpreting predictions provided by machine learning algorithms. Five diverse XAI methods, such as Shapley Additive Values (SHAP), ELI5, QLattice, Local Interpretable Model-agnostic Explanations (LIME) and Anchor, have been used to decipher the model predictions. This research aims to enable healthcare professionals to provide patients with more personalized and efficient care, while also providing a screening architecture with automated tools that can be used to revolutionize stroke prevention and treatment. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

14 pages, 5861 KiB  
Article
An Efficient Healthcare Data Mining Approach Using Apriori Algorithm: A Case Study of Eye Disorders in Young Adults
by Kanza Gulzar, Muhammad Ayoob Memon, Syed Muhammad Mohsin, Sheraz Aslam, Syed Muhammad Abrar Akber and Muhammad Asghar Nadeem
Information 2023, 14(4), 203; https://doi.org/10.3390/info14040203 - 27 Mar 2023
Cited by 8 | Viewed by 4429
Abstract
In the public health sector and the field of medicine, the popularity of data mining and its usage in knowledge discovery and databases (KDD) are rising. The growing popularity of data mining has discovered innovative healthcare links to support decision making. For this [...] Read more.
In the public health sector and the field of medicine, the popularity of data mining and its usage in knowledge discovery and databases (KDD) are rising. The growing popularity of data mining has discovered innovative healthcare links to support decision making. For this reason, there is a great possibility to better diagnose patient’s diseases and maintain the quality of healthcare services in hospitals. So, there is an urgent need to make disease diagnosis possible by discovering the hidden patterns from the patients’ history information in developing countries. This work is a step towards how to use the extracted knowledge to enhance the quality of healthcare facilities. In this paper, we have proposed a web-centered hospital information management system (HIMS) that identifies frequent patterns from the data with eye disorder patients using the association rule-based Apriori data mining technique. The proposed framework has the capability to overcome all the key issues and problems in the current hospital information management system regarding data analysis and reporting services. For this purpose, data were collected from more than 1000 university students (China citizens) both online and manually (printed questionnaire). After applying the Apriori algorithm on the collected data, we revealed that almost 140 individuals out of 1035 had myopia (near-sighted disorder), at current age of 22 years, and that there were no male patients found with myopia. We concluded that their clinical relevance and utility can generate favorable results from prospective clinical studies by mapping out the habits or lifestyles that potentially lead to fatal diseases. In the future, we plan to extend this work to fully automate HIMS to help practitioners to diagnose the reasons of various diseases by extracting patient lifestyle patterns. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

17 pages, 3515 KiB  
Article
Smart Machine Health Prediction Based on Machine Learning in Industry Environment
by Sagar Yeruva, Jeshmitha Gunuganti, Sravani Kalva, Surender Reddy Salkuti and Seong-Cheol Kim
Information 2023, 14(3), 181; https://doi.org/10.3390/info14030181 - 14 Mar 2023
Cited by 2 | Viewed by 4954
Abstract
In an industrial setting, consistent production and machine maintenance might help any company become successful. Machine health checking is a method of observing the status of a machine to predict mechanical mileage and predict the machine’s disappointment. The most often utilized traditional approaches [...] Read more.
In an industrial setting, consistent production and machine maintenance might help any company become successful. Machine health checking is a method of observing the status of a machine to predict mechanical mileage and predict the machine’s disappointment. The most often utilized traditional approaches are reactive and preventive maintenance. These approaches are unreliable and wasteful in terms of time and resource utilization. The use of system health management in conjunction with a predictive maintenance strategy allows for the scheduling of maintenance times in such a way that device malfunction is avoided, and thus the repercussions are avoided. IoT can help monitor equipment health and provide the best outcomes, especially in an industrial setting. Internet of Things (IoT) and machine learning models are quite successful in providing ongoing knowledge and comprehensive study on infrastructure performance. Our suggested technique uses a mobile application that seeks to anticipate the machine’s health status using a classification method utilizing IoT and machine learning technologies, which might benefit the industry environment by alerting the appropriate maintenance team before inflicting significant harm to the system and disrupting normal operations. A comparison of decision tree, XGBoost, SVM, and KNN performance has been carried out. According to our findings, XGBoost achieves higher classification accuracy compared to the other algorithms. As a result, this model is selected for creating a user-based application that allows the user to easily check the state of the machine’s health. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

15 pages, 1615 KiB  
Article
Challenges Encountered and Lessons Learned when Using a Novel Anonymised Linked Dataset of Health and Social Care Records for Public Health Intelligence: The Sussex Integrated Dataset
by Elizabeth Ford, Richard Tyler, Natalie Johnston, Vicki Spencer-Hughes, Graham Evans, Jon Elsom, Anotida Madzvamuse, Jacqueline Clay, Kate Gilchrist and Melanie Rees-Roberts
Information 2023, 14(2), 106; https://doi.org/10.3390/info14020106 - 8 Feb 2023
Cited by 1 | Viewed by 3226
Abstract
Background: In the United Kingdom National Health Service (NHS), digital transformation programmes have resulted in the creation of pseudonymised linked datasets of patient-level medical records across all NHS and social care services. In the Southeast England counties of East and West Sussex, public [...] Read more.
Background: In the United Kingdom National Health Service (NHS), digital transformation programmes have resulted in the creation of pseudonymised linked datasets of patient-level medical records across all NHS and social care services. In the Southeast England counties of East and West Sussex, public health intelligence analysts based in local authorities (LAs) aimed to use the newly created “Sussex Integrated Dataset” (SID) for identifying cohorts of patients who are at risk of early onset multiple long-term conditions (MLTCs). Analysts from the LAs were among the first to have access to this new dataset. Methods: Data access was assured as the analysts were employed within joint data controller organisations and logged into the data via virtual machines following approval of a data access request. Analysts examined the demographics and medical history of patients against multiple external sources, identifying data quality issues and developing methods to establish true values for cases with multiple conflicting entries. Service use was plotted over timelines for individual patients. Results: Early evaluation of the data revealed multiple conflicting within-patient values for age, sex, ethnicity and date of death. This was partially resolved by creating a “demographic milestones” table, capturing demographic details for each patient for each year of the data available in the SID. Older data (≥5 y) was found to be sparse in events and diagnoses. Open-source code lists for defining long-term conditions were poor at identifying the expected number of patients, and bespoke code lists were developed by hand and validated against other sources of data. At the start, the age and sex distributions of patients submitted by GP practices were substantially different from those published by NHS Digital, and errors in data processing were identified and rectified. Conclusions: While new NHS linked datasets appear a promising resource for tracking multi-service use, MLTCs and health inequalities, substantial investment in data analysis and data architect time is necessary to ensure high enough quality data for meaningful analysis. Our team made conceptual progress in identifying the skills needed for programming analyses and understanding the types of questions which can be asked and answered reliably in these datasets. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

19 pages, 4179 KiB  
Article
Case Study of Multichannel Interaction in Healthcare Services
by Ailton Moreira, Júlio Duarte and Manuel Filipe Santos
Information 2023, 14(1), 37; https://doi.org/10.3390/info14010037 - 7 Jan 2023
Cited by 6 | Viewed by 3315
Abstract
A multichannel interaction service is a practice whereby organizations communicate and interact with their existing customers and potential new customers through different channels. This article presents a brief case study of multichannel interaction in healthcare services, which studies the viability of continuous multichannel [...] Read more.
A multichannel interaction service is a practice whereby organizations communicate and interact with their existing customers and potential new customers through different channels. This article presents a brief case study of multichannel interaction in healthcare services, which studies the viability of continuous multichannel interaction for personalized healthcare services to enable health professionals to follow up and monitor patients in home-based care. Furthermore, this study aims to explore the possibility of the continuity and complementarity of the interactions across different communication channels with the patients. The data used for this study was gathered during the first wave of the COVID-19 pandemic. This study showed that despite this type of interaction being relatively new in healthcare services, it has considerable potential for improving the relationship between patients, health professionals, and care providers. Upon completion of the data analysis, several conclusions were drawn. One such conclusion was the ability to maintain continuity of interaction across multiple channels, as well as the synergy between the different channels of interaction available to patients and the impact this has on the way patients and health professionals interact. Additionally, it was determined that the complementarity of different interaction channels is crucial when implementing multichannel interaction services. Furthermore, the implementation of this solution resulted in improved communication between patients and health professionals. Also, it has decreased health professional’s workload and reduced care providers costs regarding remote patient follow-up. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

13 pages, 1769 KiB  
Article
Semi-Automatic Systematic Literature Reviews and Information Extraction of COVID-19 Scientific Evidence: Description and Preliminary Results of the COKE Project
by Davide Golinelli, Andrea Giovanni Nuzzolese, Francesco Sanmarchi, Luana Bulla, Misael Mongiovì, Aldo Gangemi and Paola Rucci
Information 2022, 13(3), 117; https://doi.org/10.3390/info13030117 - 28 Feb 2022
Cited by 8 | Viewed by 4078
Abstract
The COVID-19 pandemic highlighted the importance of validated and updated scientific information to help policy makers, healthcare professionals, and the public. The speed in disseminating reliable information and the subsequent guidelines and policy implementation are also essential to save as many lives as [...] Read more.
The COVID-19 pandemic highlighted the importance of validated and updated scientific information to help policy makers, healthcare professionals, and the public. The speed in disseminating reliable information and the subsequent guidelines and policy implementation are also essential to save as many lives as possible. Trustworthy guidelines should be based on a systematic evidence review which uses reproducible analytical methods to collect secondary data and analyse them. However, the guidelines’ drafting process is time consuming and requires a great deal of resources. This paper aims to highlight the importance of accelerating and streamlining the extraction and synthesis of scientific evidence, specifically within the systematic review process. To do so, this paper describes the COKE (COVID-19 Knowledge Extraction framework for next generation discovery science) Project, which involves the use of machine reading and deep learning to design and implement a semi-automated system that supports and enhances the systematic literature review and guideline drafting processes. Specifically, we propose a framework for aiding in the literature selection and navigation process that employs natural language processing and clustering techniques for selecting and organizing the literature for human consultation, according to PICO (Population/Problem, Intervention, Comparison, and Outcome) elements. We show some preliminary results of the automatic classification of sentences on a dataset of abstracts related to COVID-19. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

16 pages, 1938 KiB  
Article
A Privacy-Preserving and Standard-Based Architecture for Secondary Use of Clinical Data
by Mario Ciampi, Mario Sicuranza and Stefano Silvestri
Information 2022, 13(2), 87; https://doi.org/10.3390/info13020087 - 13 Feb 2022
Cited by 9 | Viewed by 6535
Abstract
The heterogeneity of the formats and standards of clinical data, which includes both structured, semi-structured, and unstructured data, in addition to the sensitive information contained in them, require the definition of specific approaches that are able to implement methodologies that can permit the [...] Read more.
The heterogeneity of the formats and standards of clinical data, which includes both structured, semi-structured, and unstructured data, in addition to the sensitive information contained in them, require the definition of specific approaches that are able to implement methodologies that can permit the extraction of valuable information buried under such data. Although many challenges and issues that have not been fully addressed still exist when this information must be processed and used for further purposes, the most recent techniques based on machine learning and big data analytics can support the information extraction process for the secondary use of clinical data. In particular, these techniques can facilitate the transformation of heterogeneous data into a common standard format. Moreover, they can also be exploited to define anonymization or pseudonymization approaches, respecting the privacy requirements stated in the General Data Protection Regulation, Health Insurance Portability and Accountability Act and other national and regional laws. In fact, compliance with these laws requires that only de-identified clinical and personal data can be processed for secondary analyses, in particular when data is shared or exchanged across different institutions. This work proposes a modular architecture capable of collecting clinical data from heterogeneous sources and transforming them into useful data for secondary uses, such as research, governance, and medical education purposes. The proposed architecture is able to exploit appropriate modules and algorithms, carry out transformations (pseudonymization and standardization) required to use data for the second purposes, as well as provide efficient tools to facilitate the retrieval and analysis processes. Preliminary experimental tests show good accuracy in terms of quantitative evaluations. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

22 pages, 3001 KiB  
Article
A Text Mining Approach in the Classification of Free-Text Cancer Pathology Reports from the South African National Health Laboratory Services
by Okechinyere J. Achilonu, Victor Olago, Elvira Singh, René M. J. C. Eijkemans, Gideon Nimako and Eustasius Musenge
Information 2021, 12(11), 451; https://doi.org/10.3390/info12110451 - 30 Oct 2021
Cited by 6 | Viewed by 4122
Abstract
A cancer pathology report is a valuable medical document that provides information for clinical management of the patient and evaluation of health care. However, there are variations in the quality of reporting in free-text style formats, ranging from comprehensive to incomplete reporting. Moreover, [...] Read more.
A cancer pathology report is a valuable medical document that provides information for clinical management of the patient and evaluation of health care. However, there are variations in the quality of reporting in free-text style formats, ranging from comprehensive to incomplete reporting. Moreover, the increasing incidence of cancer has generated a high throughput of pathology reports. Hence, manual extraction and classification of information from these reports can be intrinsically complex and resource-intensive. This study aimed to (i) evaluate the quality of over 80,000 breast, colorectal, and prostate cancer free-text pathology reports and (ii) assess the effectiveness of random forest (RF) and variants of support vector machine (SVM) in the classification of reports into benign and malignant classes. The study approach comprises data preprocessing, visualisation, feature selections, text classification, and evaluation of performance metrics. The performance of the classifiers was evaluated across various feature sizes, which were jointly selected by four filter feature selection methods. The feature selection methods identified established clinical terms, which are synonymous with each of the three cancers. Uni-gram tokenisation using the classifiers showed that the predictive power of RF model was consistent across various feature sizes, with overall F-scores of 95.2%, 94.0%, and 95.3% for breast, colorectal, and prostate cancer classification, respectively. The radial SVM achieved better classification performance compared with its linear variant for most of the feature sizes. The classifiers also achieved high precision, recall, and accuracy. This study supports a nationally agreed standard in pathology reporting and the use of text mining for encoding, classifying, and production of high-quality information abstractions for cancer prognosis and research. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

Review

Jump to: Research

30 pages, 1536 KiB  
Review
From Data to Diagnosis: Machine Learning Revolutionizes Epidemiological Predictions
by Abdul Aziz Abdul Rahman, Gowri Rajasekaran, Rathipriya Ramalingam, Abdelrhman Meero and Dhamodharavadhani Seetharaman
Information 2024, 15(11), 719; https://doi.org/10.3390/info15110719 - 8 Nov 2024
Viewed by 651
Abstract
The outbreak of epidemiological diseases creates a major impact on humanity as well as on the world’s economy. The consequence of such infectious diseases affects the survival of mankind. The government has to stand up to the negative influence of these epidemiological diseases [...] Read more.
The outbreak of epidemiological diseases creates a major impact on humanity as well as on the world’s economy. The consequence of such infectious diseases affects the survival of mankind. The government has to stand up to the negative influence of these epidemiological diseases and facilitate society with medical resources and economical support. In recent times, COVID-19 has been one of the epidemiological diseases that created lethal effects and a greater slump in the economy. Therefore, the prediction of outbreaks is essential for epidemiological diseases. It may be either frequent or sudden infections in society. The unexpected raise in the application of prediction models in recent years is outstanding. A study on these epidemiological prediction models and their usage from the year 2018 onwards is highlighted in this article. The popularity of various prediction approaches is emphasized and summarized in this article. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Graphical abstract

Back to TopTop