applsci-logo

Journal Browser

Journal Browser

Applications of Artificial Intelligence in Biomedical Data Analysis

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Biomedical Engineering".

Deadline for manuscript submissions: closed (31 August 2024) | Viewed by 26268

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Engineering, Inha University, Yonghyeon-dong Nam-gu Inchon 402-751, Republic of Korea
Interests: algorithm; bioinformatics; data analysis
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Electronics and Telecommunications Research Institute, Daejeon, Korea
Interests: image processing; health-IT; bioinformatics; data mining

Special Issue Information

Dear Colleagues,

Biomedical data have exploded over the past decade. The amount of data available for analysis is very large due to the increase in genomic data by the reduction in the cost of gene sequencing and the digitization of medical records. This flood of biomedical information requires new thinking about how data can be used to enhance scientific understanding and improve bioresearch and healthcare services. The emergence of deep learning technology, which is developing along with AI technology, provides a new solution to biomedical data analysis and can be used in clinical research. In this Special Issue, we want to address recent advances in the following topics related to AI:

  • Biomedical data analysis;
  • Biomedical engineering;
  • Bioinformatics;
  • Sequence analysis;
  • Time series data analysis.

Submissions are invited for both original research and review articles. We hope that this collection of papers will serve as an inspiration for those interested in the applications of Artificial Intelligence in biomedical informatics.

Dr. Jeong Seop Sim
Dr. SooJun Park
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • deep learning
  • biomedical data analysis
  • sequence analysis
  • bioinformatics

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

32 pages, 6598 KiB  
Article
An Accurate Deep Learning-Based Computer-Aided Diagnosis System for Gastrointestinal Disease Detection Using Wireless Capsule Endoscopy Image Analysis
by Sameh Abd El-Ghany, Mahmood A. Mahmood and A. A. Abd El-Aziz
Appl. Sci. 2024, 14(22), 10243; https://doi.org/10.3390/app142210243 - 7 Nov 2024
Viewed by 605
Abstract
Peptic ulcers and stomach cancer are common conditions that impact the gastrointestinal (GI) system. Wireless capsule endoscopy (WCE) has emerged as a widely used, noninvasive technique for diagnosing these issues, providing valuable insights through the detailed imaging of the GI tract. Therefore, an [...] Read more.
Peptic ulcers and stomach cancer are common conditions that impact the gastrointestinal (GI) system. Wireless capsule endoscopy (WCE) has emerged as a widely used, noninvasive technique for diagnosing these issues, providing valuable insights through the detailed imaging of the GI tract. Therefore, an early and accurate diagnosis of GI diseases is crucial for effective treatment. This paper introduces the Intelligent Learning Rate Controller (ILRC) mechanism that optimizes the training of deep learning (DL) models by adaptively adjusting the learning rate (LR) based on training progress. This helps improve convergence speed and reduce the risk of overfitting. The ILRC was applied to four DL models: EfficientNet-B0, ResNet101v2, InceptionV3, and InceptionResNetV2. These models were further enhanced using transfer learning, freezing layers, fine-tuning techniques, residual learning, and modern regularization methods. The models were evaluated on two datasets, the Kvasir-Capsule and KVASIR v2 datasets, which contain WCE images. The results demonstrated that the models, particularly when using ILRC, outperformed existing state-of-the-art methods in accuracy. On the Kvasir-Capsule dataset, the models achieved accuracies of up to 99.906%, and on the Kvasir-v2 dataset, they achieved up to 98.062%. This combination of techniques offers a robust solution for automating the detection of GI abnormalities in WCE images, significantly enhancing diagnostic efficiency and accuracy in clinical settings. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

20 pages, 4612 KiB  
Article
Implementation of a Generative AI Algorithm for Virtually Increasing the Sample Size of Clinical Studies
by Anastasios Nikolopoulos and Vangelis D. Karalis
Appl. Sci. 2024, 14(11), 4570; https://doi.org/10.3390/app14114570 - 26 May 2024
Cited by 1 | Viewed by 1270
Abstract
Determining the appropriate sample size is crucial in clinical studies due to the potential limitations of small sample sizes in detecting true effects. This work introduces the use of Wasserstein Generative Adversarial Networks (WGANs) to create virtual subjects and reduce the need for [...] Read more.
Determining the appropriate sample size is crucial in clinical studies due to the potential limitations of small sample sizes in detecting true effects. This work introduces the use of Wasserstein Generative Adversarial Networks (WGANs) to create virtual subjects and reduce the need for recruiting actual human volunteers. The proposed idea suggests that only a small subset (“sample”) of the true population can be used along with WGANs to create a virtual population (“generated” dataset). To demonstrate the suitability of the WGAN-based approach, a new methodological procedure was also required to be established and applied. Monte Carlo simulations of clinical studies were performed to compare the performance of the WGAN-synthesized virtual subjects (i.e., the “generated” dataset) against both the entire population (the so-called “original” dataset) and a subset of it, the “sample”. After training and tuning the WGAN, various scenarios were explored, and the comparative performance of the three datasets was evaluated, as well as the similarity in the results against the population data. Across all scenarios tested, integrating WGANs and their corresponding generated populations consistently exhibited superior performance compared with those from samples alone. The generated datasets also exhibited quite similar performance compared with the “original” (i.e., population) data. By introducing virtual patients, WGANs effectively augment sample size, reducing the risk of type II errors. The proposed WGAN approach has the potential to decrease costs, time, and ethical concerns associated with human participation in clinical trials. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

19 pages, 2760 KiB  
Article
Explainable Multimodal Graph Isomorphism Network for Interpreting Sex Differences in Adolescent Neurodevelopment
by Binish Patel, Anton Orlichenko, Adnan Patel, Gang Qu, Tony W. Wilson, Julia M. Stephen, Vince D. Calhoun and Yu-Ping Wang
Appl. Sci. 2024, 14(10), 4144; https://doi.org/10.3390/app14104144 - 14 May 2024
Viewed by 1131
Abstract
Background: A fundamental grasp of the variability observed in healthy individuals holds paramount importance in the investigation of neuropsychiatric conditions characterized by sex-related phenotypic distinctions. Functional magnetic resonance imaging (fMRI) serves as a meaningful tool for discerning these differences. Among deep learning [...] Read more.
Background: A fundamental grasp of the variability observed in healthy individuals holds paramount importance in the investigation of neuropsychiatric conditions characterized by sex-related phenotypic distinctions. Functional magnetic resonance imaging (fMRI) serves as a meaningful tool for discerning these differences. Among deep learning models, graph neural networks (GNNs) are particularly well-suited for analyzing brain networks derived from fMRI blood oxygen level-dependent (BOLD) signals, enabling the effective exploration of sex differences during adolescence. Method: In the present study, we introduce a multi-modal graph isomorphism network (MGIN) designed to elucidate sex-based disparities using fMRI task-related data. Our approach amalgamates brain networks obtained from multiple scans of the same individual, thereby enhancing predictive capabilities and feature identification. The MGIN model adeptly pinpoints crucial subnetworks both within and between multi-task fMRI datasets. Moreover, it offers interpretability through the utilization of GNNExplainer, which identifies pivotal sub-network graph structures contributing significantly to sex group classification. Results: Our findings indicate that the MGIN model outperforms competing models in terms of classification accuracy, underscoring the benefits of combining two fMRI paradigms. Additionally, our model discerns the most significant sex-related functional networks, encompassing the default mode network (DMN), visual (VIS) network, cognitive (CNG) network, frontal (FRNT) network, salience (SAL) network, subcortical (SUB) network, and sensorimotor (SM) network associated with hand and mouth movements. Remarkably, the MGIN model achieves superior sex classification accuracy when juxtaposed with other state-of-the-art algorithms, yielding a noteworthy 81.67% improvement in classification accuracy. Conclusion: Our model’s superiority emanates from its capacity to consolidate data from multiple scans of subjects within a proven interpretable framework. Beyond its classification prowess, our model guides our comprehension of neurodevelopment during adolescence by identifying critical subnetworks of functional connectivity. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

23 pages, 5723 KiB  
Article
Layer-Weighted Attention and Ascending Feature Selection: An Approach for Seriousness Level Prediction Using the FDA Adverse Event Reporting System
by Bader Aldughayfiq, Hisham Allahem, Ayman Mohamed Mostafa, Mohammed Alnusayri and Mohamed Ezz
Appl. Sci. 2024, 14(8), 3280; https://doi.org/10.3390/app14083280 - 13 Apr 2024
Viewed by 961
Abstract
In this study, we introduce a novel combination of layer-static-weighted attention and ascending feature selection techniques to predict the seriousness level of adverse drug events using the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). We utilized natural language processing (NLP) [...] Read more.
In this study, we introduce a novel combination of layer-static-weighted attention and ascending feature selection techniques to predict the seriousness level of adverse drug events using the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). We utilized natural language processing (NLP) to analyze the terms in the active substance field, in addition to considering demographic and event information such as patient sex, healthcare provider qualification, and drug characterization. Our ascending feature selection method, which progressively incorporates additional features based on their importance, demonstrated continuous enhancements in prediction performance. Simultaneously, we employed a layer-static-weighted attention technique, which dynamically adjusts the model’s focus between natural language processing (NLP) and demographic features. This technique achieved its best performance at a balanced weight of 50%, yielding an average test accuracy of 74.56% and CV ROC score of 0.83 when 4000 features were included, indicating a compelling advantage to include a larger volume of meaningful features. By integrating these methodologies, we constructed a robust model capable of effectively predicting seriousness levels, offering significant potential for improving pharmacovigilance and enhancing drug safety monitoring. The results underscore the value of NLP and demographic data in predicting drug event seriousness and demonstrate the effectiveness of our combined techniques. We encourage further research to refine these methods and evaluate their application to other clinical datasets. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

19 pages, 1919 KiB  
Article
Applying a Recurrent Neural Network-Based Deep Learning Model for Gene Expression Data Classification
by Sergii Babichev, Igor Liakh and Irina Kalinina
Appl. Sci. 2023, 13(21), 11823; https://doi.org/10.3390/app132111823 - 29 Oct 2023
Cited by 3 | Viewed by 2769
Abstract
The importance of gene expression data processing in solving the classification task is determined by its ability to discern intricate patterns and relationships within genetic information, enabling the precise categorization and understanding of various gene expression profiles and their consequential impacts on biological [...] Read more.
The importance of gene expression data processing in solving the classification task is determined by its ability to discern intricate patterns and relationships within genetic information, enabling the precise categorization and understanding of various gene expression profiles and their consequential impacts on biological processes and traits. In this study, we investigated various architectures and types of recurrent neural networks focusing on gene expression data. The effectiveness of the appropriate model was evaluated using various classification quality criteria based on type 1 and type 2 errors. Moreover, we calculated the integrated F1-score index using the Harrington desirability method, the value of which allowed us to improve the objectivity of the decision making when model effectiveness was evaluated. The final decision regarding model effectiveness was made based on a comprehensive classification quality criterion, which was calculated as the weighted sum of classification accuracy, integrated F1-score index, and loss function values. The simulation results show higher appeal of a single-layer GRU recurrent network with 75 neurons in the recurrent layer. We also compared convolutional and recurrent neural networks on gene expression data classification. Although convolutional neural networks showcase benefits in terms of loss function value and training time, a comparative analysis revealed that in terms of classification accuracy calculated on the test data subset, the GRU neural network model is slightly better than the CNN and LSTM models. The classification accuracy when using the GRU network was 97.2%; in other cases, it was 97.1%. In the first case, 954 out of 981 objects were correctly identified. In other cases, 952 objects were correctly identified. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

13 pages, 669 KiB  
Article
Detection of Unknown Polymorphic Patterns Using Feature-Extracting Part of a Convolutional Autoencoder
by Przemysław Kucharski and Krzysztof Ślot
Appl. Sci. 2023, 13(19), 10842; https://doi.org/10.3390/app131910842 - 29 Sep 2023
Viewed by 895
Abstract
Background: The present paper proposes a novel approach for detecting the presence of unknown polymorphic patterns in random symbol sequences that also comprise already known polymorphic patterns. Methods: We propose to represent rules that define the considered patterns as regular expressions and show [...] Read more.
Background: The present paper proposes a novel approach for detecting the presence of unknown polymorphic patterns in random symbol sequences that also comprise already known polymorphic patterns. Methods: We propose to represent rules that define the considered patterns as regular expressions and show how these expressions can be modeled using filter cascades of neural convolutional layers. We adopted a convolutional autoencoder (CAE) as a pattern detection framework. To detect unknown patterns, we first incorporated knowledge of known rules into the CAE’s convolutional feature extractor by fixing weights in some of its filter cascades. Then, we executed the learning procedure, where the weights of the remaining filters were driven by two different objectives. The first was to ensure correct sequence reconstruction, whereas the second was to prevent weights from learning the already known patterns. Results: The proposed methodology was tested on sample sequences derived from the human genome. The analysis of the experimental results provided statistically significant information on the presence or absence of polymorphic patterns that were not known in advance. Conclusions: The proposed method was able to detect the existence of unknown polymorphic patterns. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

12 pages, 4620 KiB  
Article
Automated Clinical Impression Generation for Medical Signal Data Searches
by Woonghee Lee, Jaewoo Yang, Doyeong Park and Younghoon Kim
Appl. Sci. 2023, 13(15), 8931; https://doi.org/10.3390/app13158931 - 3 Aug 2023
Viewed by 1371
Abstract
Medical retrieval systems have become significantly important in clinical settings. However, commercial retrieval systems that heavily rely on term-based indexing face challenges when handling continuous medical data, such as electroencephalography data, primarily due to the high cost associated with utilizing neurologist analyses. With [...] Read more.
Medical retrieval systems have become significantly important in clinical settings. However, commercial retrieval systems that heavily rely on term-based indexing face challenges when handling continuous medical data, such as electroencephalography data, primarily due to the high cost associated with utilizing neurologist analyses. With the increasing affordability of data recording systems, it becomes increasingly crucial to address these challenges. Traditional procedures for annotating, classifying, and interpreting medical data are costly, time consuming, and demand specialized knowledge. While cross-modal retrieval systems have been proposed to address these challenges, most concentrate on images and text, sidelining time-series medical data like electroencephalography data. As the interpretation of electroencephalography signals, which document brain activity, requires a neurologist’s expertise, this process is often the most expensive component. Therefore, a retrieval system capable of using text to identify relevant signals, eliminating the need for expert analysis, is desirable. Our research proposes a solution to facilitate the creation of indexing systems employing electroencephalography signals for report generation in situations where reports are pending a neurologist review. We introduce a method incorporating a convolutional-neural-network-based encoder from DeepSleepNet, which extracts features from electroencephalography signals, coupled with a transformer which learns the signal’s auto-correlation and the relationship between the signal and the corresponding report. Experimental evaluation using real-world data revealed our approach surpasses baseline methods. These findings suggest potential advancements in medical data retrieval and a decrease in reliance on expert knowledge for electroencephalography signal analysis. As such, our research represents a significant stride towards making electroencephalography data more comprehensible and utilizable in clinical environments. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

19 pages, 1618 KiB  
Article
A Hybrid Model of Cancer Diseases Diagnosis Based on Gene Expression Data with Joint Use of Data Mining Methods and Machine Learning Techniques
by Sergii Babichev, Lyudmyla Yasinska-Damri and Igor Liakh
Appl. Sci. 2023, 13(10), 6022; https://doi.org/10.3390/app13106022 - 14 May 2023
Cited by 7 | Viewed by 2019
Abstract
One of the current focuses of modern bioinformatics is the development of hybrid models to process gene expression data, in order to create diagnostic systems for various diseases. In this study, we propose a solution to this problem that combines an inductive spectral [...] Read more.
One of the current focuses of modern bioinformatics is the development of hybrid models to process gene expression data, in order to create diagnostic systems for various diseases. In this study, we propose a solution to this problem that combines an inductive spectral clustering algorithm, random forest classifier, convolutional neural network, and alternative voting method for making the final decision about patient condition. In the first stage, we apply the spectral clustering algorithm to gene expression profiles using inductive methods of objective clustering, with the calculation of internal, external, and balance clustering quality criteria. This results in clusters of mutually correlated and differently expressed gene expression profiles. In the second stage, we apply the random forest classifier and convolutional neural network to identify the examined objects, containing as attributes the gene expression values in the allocated clusters. The presented research solves both binary- and multi-classification tasks. The final decision about the patient’s condition is made using the alternative voting method, considering the classification results based on the gene expression data in various clusters. The simulation results showed that the proposed technique was highly effective, achieving a high accuracy in object identification when both classifiers were used. However, the convolutional neural network had a significantly higher data processing efficiency than the random forest algorithm, due to its substantially shorter processing time. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

11 pages, 5419 KiB  
Article
Web Interface of NER and RE with BERT for Biomedical Text Mining
by Yeon-Ji Park, Min-a Lee, Geun-Je Yang, Soo Jun Park and Chae-Bong Sohn
Appl. Sci. 2023, 13(8), 5163; https://doi.org/10.3390/app13085163 - 21 Apr 2023
Cited by 2 | Viewed by 2660
Abstract
The BioBERT Named Entity Recognition (NER) model is a high-performance model designed to identify both known and unknown entities. It surpasses previous NER models utilized by text-mining tools, such as tmTool and ezTag, in effectively discovering novel entities. In previous studies, the Biomedical [...] Read more.
The BioBERT Named Entity Recognition (NER) model is a high-performance model designed to identify both known and unknown entities. It surpasses previous NER models utilized by text-mining tools, such as tmTool and ezTag, in effectively discovering novel entities. In previous studies, the Biomedical Entity Recognition and Multi-Type Normalization Tool (BERN) employed this model to identify words that represent specific names, discern the type of the word, and implement it on a web page to offer NER service. However, we aimed to offer a web service that includes Relation Extraction (RE), a task determining the relation between entity pairs within a sentence. First, just like BERN, we fine-tuned the BioBERT NER model within the biomedical domain to recognize new entities. We identified two categories: diseases and genes/proteins. Additionally, we fine-tuned the BioBERT RE model to determine the presence or absence of a relation between the identified gene–disease entity pairs. The NER and RE results are displayed on a web page using the Django web framework. NER results are presented in distinct colors, and RE results are visualized as graphs in NetworkX and Cytoscape, allowing users to interact with the graphs. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

10 pages, 557 KiB  
Article
Order-Preserving Multiple Pattern Matching in Parallel
by Somin Park, Jinhyeok Park, Youngho Kim and Jeong Seop Sim
Appl. Sci. 2023, 13(8), 5142; https://doi.org/10.3390/app13085142 - 20 Apr 2023
Cited by 1 | Viewed by 1269
Abstract
The order-preserving multiple pattern matching problem is to find all substrings of T whose relative orders are the same for any pattern in a set of patterns. Various sequential algorithms have been studied for the order-preserving multiple pattern matching problems. In this paper, [...] Read more.
The order-preserving multiple pattern matching problem is to find all substrings of T whose relative orders are the same for any pattern in a set of patterns. Various sequential algorithms have been studied for the order-preserving multiple pattern matching problems. In this paper, we propose two parallel algorithms, each of which uses Aho–Corasick automata and fingerprint tables, respectively. We also present experimental results of comparing the execution times of each parallel algorithm on various types of time-series data. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

18 pages, 2664 KiB  
Article
RNA Sequences-Based Diagnosis of Parkinson’s Disease Using Various Feature Selection Methods and Machine Learning
by Jingeun Kim, Hye-Jin Park and Yourim Yoon
Appl. Sci. 2023, 13(4), 2698; https://doi.org/10.3390/app13042698 - 20 Feb 2023
Cited by 2 | Viewed by 3029
Abstract
Parkinson’s disease is a neurodegenerative disease that is associated with genetic and environmental factors. However, the genes causing this degeneration have not been determined, and no reported cure exists for this disease. Recently, studies have been conducted to classify diseases with RNA-seq data [...] Read more.
Parkinson’s disease is a neurodegenerative disease that is associated with genetic and environmental factors. However, the genes causing this degeneration have not been determined, and no reported cure exists for this disease. Recently, studies have been conducted to classify diseases with RNA-seq data using machine learning, and accurate diagnosis of diseases using machine learning is becoming an important task. In this study, we focus on how various feature selection methods can improve the performance of machine learning for accurate diagnosis of Parkinson’s disease. In addition, we analyzed the performance metrics and computational costs of running the model with and without various feature selection methods. Experiments were conducted using RNA sequencing—a technique that analyzes the transcription profiling of organisms using next-generation sequencing. Genetic algorithms (GA), information gain (IG), and wolf search algorithm (WSA) were employed as feature selection methods. Machine learning algorithms—extreme gradient boosting (XGBoost), deep neural network (DNN), support vector machine (SVM), and decision tree (DT)—were used as classifiers. Further, the model was evaluated using performance indicators, such as accuracy, precision, recall, F1 score, and receiver operating characteristic (ROC) curve. For XGBoost and DNN, feature selection methods based on GA, IG, and WSA improved the performance of machine learning by 10.00% and 38.18%, respectively. For SVM and DT, performance was improved by 0.91% and 7.27%, respectively, with feature selection methods based on IG and WSA. The results demonstrate that various feature selection methods improve the performance of machine learning when classifying Parkinson’s disease using RNA-seq data. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

15 pages, 1714 KiB  
Article
A Wrapped Approach Using Unlabeled Data for Diabetic Retinopathy Diagnosis
by Xuefeng Zhang, Youngsung Kim, Young-Chul Chung, Sangcheol Yoon, Sang-Yong Rhee and Yong Soo Kim
Appl. Sci. 2023, 13(3), 1901; https://doi.org/10.3390/app13031901 - 1 Feb 2023
Cited by 2 | Viewed by 1492
Abstract
Large-scale datasets, which have sufficient and identical quantities of data in each class, are the main factor in the success of deep-learning-based classification models for vision tasks. A shortage of sufficient data and interclass imbalanced data distribution, which often arise in the medical [...] Read more.
Large-scale datasets, which have sufficient and identical quantities of data in each class, are the main factor in the success of deep-learning-based classification models for vision tasks. A shortage of sufficient data and interclass imbalanced data distribution, which often arise in the medical domain, cause modern deep neural networks to suffer greatly from imbalanced learning and overfitting. A diagnostic model of diabetic retinopathy (DR) that is trained from such a dataset using supervised learning is severely biased toward the majority class. To enhance the efficiency of imbalanced learning, the proposal of this study is to leverage retinal fundus images without human annotations by self-supervised or semi-supervised learning. The proposed approach to DR detection is to add an auxiliary procedure to the target task that identifies DR using supervised learning. The added process uses unlabeled data to pre-train the model that first learns features from data using self-supervised or semi-supervised learning, and then the pre-trained model is transferred with the learned parameter to the target model. This wrapper algorithm of learning from unlabeled data can help the model gain more information from samples in the minority class, thereby improving imbalanced learning to some extent. Comprehensive experiments demonstrate that the model trained with the proposed method outperformed the one trained with only the supervised learning baseline utilizing the same data, with an accuracy improvement of 4~5%. To further examine the method proposed in this study, a comparison is conducted, and our results show that the proposed method also performs much better than some state-of-the-art methods. In the case of EyePaCS, for example, the proposed method outperforms the customized CNN model by 9%. Through experiments, we further find that the models trained with a smaller but balanced dataset are not worse than those trained with a larger but imbalanced dataset. Therefore, our study reveals that utilizing unlabeled data can avoid the expensive cost of collecting and labeling large-scale medical datasets. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

13 pages, 4331 KiB  
Article
Biomedical Text NER Tagging Tool with Web Interface for Generating BERT-Based Fine-Tuning Dataset
by Yeon-Ji Park, Min-a Lee, Geun-Je Yang, Soo Jun Park and Chae-Bong Sohn
Appl. Sci. 2022, 12(23), 12012; https://doi.org/10.3390/app122312012 - 24 Nov 2022
Cited by 2 | Viewed by 2960
Abstract
In this paper, a tagging tool is developed to streamline the process of locating tags for each term and manually selecting the target term. It directly extracts the terms to be tagged from sentences and displays it to the user. It also increases [...] Read more.
In this paper, a tagging tool is developed to streamline the process of locating tags for each term and manually selecting the target term. It directly extracts the terms to be tagged from sentences and displays it to the user. It also increases tagging efficiency by allowing users to reflect candidate categories in untagged terms. It is based on annotations automatically generated using machine learning. Subsequently, this architecture is fine-tuned using Bidirectional Encoder Representations from Transformers (BERT) to enable the tagging of terms that cannot be captured using Named-Entity Recognition (NER). The tagged text data extracted using the proposed tagging tool can be used as an additional training dataset. The tagging tool, which receives and saves new NE annotation input online, is added to the NER and RE web interfaces using BERT. Annotation information downloaded by the user includes the category (e.g., diseases, genes/proteins) and the list of words associated to the named entity selected by the user. The results reveal that the RE and NER results are improved using the proposed web service by collecting more NE annotation data and fine-tuning the model using generated datasets. Our application programming interfaces and demonstrations are available to the public at via the website link provided in this paper. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

23 pages, 2229 KiB  
Article
Hybrid Inductive Model of Differentially and Co-Expressed Gene Expression Profile Extraction Based on the Joint Use of Clustering Technique and Convolutional Neural Network
by Sergii Babichev, Lyudmyla Yasinska-Damri, Igor Liakh and Jiří Škvor
Appl. Sci. 2022, 12(22), 11795; https://doi.org/10.3390/app122211795 - 20 Nov 2022
Cited by 6 | Viewed by 1399
Abstract
The development of hybrid models focused on gene expression data processing for the allocation of differentially expressed and mutually correlated genes is one of the current directions in modern bioinformatics. The solution to this problem can allow us to improve the effectiveness of [...] Read more.
The development of hybrid models focused on gene expression data processing for the allocation of differentially expressed and mutually correlated genes is one of the current directions in modern bioinformatics. The solution to this problem can allow us to improve the effectiveness of existing systems for complex diseases diagnosis based on gene expression data analysis on the one hand and increase the efficiency of gene regulatory network reconstruction procedures by more careful selection of genes by considering the type of disease on the other hand. In this research, we propose a stepwise procedure to form the subsets of mutually correlated and differentially expressed gene expression profiles (GEP). Firstly, we allocate an informative GEP in terms of statistical and entropy criteria using the Harrington desirability function. Then, we performed cluster analysis using SOTA and spectral clustering algorithms implemented within the framework of objective clustering inductive technology. The result of this step’s implementation is a set of clusters containing co- and differentially expressed GEPs. Validation of the model was performed using a one-dimensional two-layer convolutional neural network (CNN). The analysis of the simulation results has shown the high efficiency of the proposed model. The clusters of GEPs formed based on the clustering quality criteria values allowed us to identify the investigated objects with high accuracy. Moreover, the simulation results have also shown that the hybrid inductive model based on the spectral clustering algorithm is more effective in comparison with the use of the SOTA clustering algorithm in terms of both the complexity of the formed optimal cluster structure and the classification accuracy of the objects that contain the allocated gene expression data as attributes. The proposed hybrid inductive model contributes to increasing objectivity during the formation of the subsets of differentially and co-expressed gene expression profiles for further their application in various disease diagnosis systems and for gene regulatory network reconstruction. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

Back to TopTop