Applied Sciences

Research

32 pages, 6598 KiB

Open AccessArticle

An Accurate Deep Learning-Based Computer-Aided Diagnosis System for Gastrointestinal Disease Detection Using Wireless Capsule Endoscopy Image Analysis

by Sameh Abd El-Ghany, Mahmood A. Mahmood and A. A. Abd El-Aziz

Appl. Sci. 2024, 14(22), 10243; https://doi.org/10.3390/app142210243 - 7 Nov 2024

Viewed by 605

Abstract

Peptic ulcers and stomach cancer are common conditions that impact the gastrointestinal (GI) system. Wireless capsule endoscopy (WCE) has emerged as a widely used, noninvasive technique for diagnosing these issues, providing valuable insights through the detailed imaging of the GI tract. Therefore, an [...] Read more.

Peptic ulcers and stomach cancer are common conditions that impact the gastrointestinal (GI) system. Wireless capsule endoscopy (WCE) has emerged as a widely used, noninvasive technique for diagnosing these issues, providing valuable insights through the detailed imaging of the GI tract. Therefore, an early and accurate diagnosis of GI diseases is crucial for effective treatment. This paper introduces the Intelligent Learning Rate Controller (ILRC) mechanism that optimizes the training of deep learning (DL) models by adaptively adjusting the learning rate (LR) based on training progress. This helps improve convergence speed and reduce the risk of overfitting. The ILRC was applied to four DL models: EfficientNet-B0, ResNet101v2, InceptionV3, and InceptionResNetV2. These models were further enhanced using transfer learning, freezing layers, fine-tuning techniques, residual learning, and modern regularization methods. The models were evaluated on two datasets, the Kvasir-Capsule and KVASIR v2 datasets, which contain WCE images. The results demonstrated that the models, particularly when using ILRC, outperformed existing state-of-the-art methods in accuracy. On the Kvasir-Capsule dataset, the models achieved accuracies of up to 99.906%, and on the Kvasir-v2 dataset, they achieved up to 98.062%. This combination of techniques offers a robust solution for automating the detection of GI abnormalities in WCE images, significantly enhancing diagnostic efficiency and accuracy in clinical settings. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

► Show Figures

Figure 1

20 pages, 4612 KiB

Open AccessArticle

Implementation of a Generative AI Algorithm for Virtually Increasing the Sample Size of Clinical Studies

by Anastasios Nikolopoulos and Vangelis D. Karalis

Appl. Sci. 2024, 14(11), 4570; https://doi.org/10.3390/app14114570 - 26 May 2024

Cited by 1 | Viewed by 1270

Abstract

Determining the appropriate sample size is crucial in clinical studies due to the potential limitations of small sample sizes in detecting true effects. This work introduces the use of Wasserstein Generative Adversarial Networks (WGANs) to create virtual subjects and reduce the need for [...] Read more.

Determining the appropriate sample size is crucial in clinical studies due to the potential limitations of small sample sizes in detecting true effects. This work introduces the use of Wasserstein Generative Adversarial Networks (WGANs) to create virtual subjects and reduce the need for recruiting actual human volunteers. The proposed idea suggests that only a small subset (“sample”) of the true population can be used along with WGANs to create a virtual population (“generated” dataset). To demonstrate the suitability of the WGAN-based approach, a new methodological procedure was also required to be established and applied. Monte Carlo simulations of clinical studies were performed to compare the performance of the WGAN-synthesized virtual subjects (i.e., the “generated” dataset) against both the entire population (the so-called “original” dataset) and a subset of it, the “sample”. After training and tuning the WGAN, various scenarios were explored, and the comparative performance of the three datasets was evaluated, as well as the similarity in the results against the population data. Across all scenarios tested, integrating WGANs and their corresponding generated populations consistently exhibited superior performance compared with those from samples alone. The generated datasets also exhibited quite similar performance compared with the “original” (i.e., population) data. By introducing virtual patients, WGANs effectively augment sample size, reducing the risk of type II errors. The proposed WGAN approach has the potential to decrease costs, time, and ethical concerns associated with human participation in clinical trials. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

► Show Figures

Figure 1

19 pages, 2760 KiB

Open AccessArticle

Explainable Multimodal Graph Isomorphism Network for Interpreting Sex Differences in Adolescent Neurodevelopment

by Binish Patel, Anton Orlichenko, Adnan Patel, Gang Qu, Tony W. Wilson, Julia M. Stephen, Vince D. Calhoun and Yu-Ping Wang

Appl. Sci. 2024, 14(10), 4144; https://doi.org/10.3390/app14104144 - 14 May 2024

Viewed by 1131

Abstract

Background: A fundamental grasp of the variability observed in healthy individuals holds paramount importance in the investigation of neuropsychiatric conditions characterized by sex-related phenotypic distinctions. Functional magnetic resonance imaging (fMRI) serves as a meaningful tool for discerning these differences. Among deep learning [...] Read more.

Background: A fundamental grasp of the variability observed in healthy individuals holds paramount importance in the investigation of neuropsychiatric conditions characterized by sex-related phenotypic distinctions. Functional magnetic resonance imaging (fMRI) serves as a meaningful tool for discerning these differences. Among deep learning models, graph neural networks (GNNs) are particularly well-suited for analyzing brain networks derived from fMRI blood oxygen level-dependent (BOLD) signals, enabling the effective exploration of sex differences during adolescence. Method: In the present study, we introduce a multi-modal graph isomorphism network (MGIN) designed to elucidate sex-based disparities using fMRI task-related data. Our approach amalgamates brain networks obtained from multiple scans of the same individual, thereby enhancing predictive capabilities and feature identification. The MGIN model adeptly pinpoints crucial subnetworks both within and between multi-task fMRI datasets. Moreover, it offers interpretability through the utilization of GNNExplainer, which identifies pivotal sub-network graph structures contributing significantly to sex group classification. Results: Our findings indicate that the MGIN model outperforms competing models in terms of classification accuracy, underscoring the benefits of combining two fMRI paradigms. Additionally, our model discerns the most significant sex-related functional networks, encompassing the default mode network (DMN), visual (VIS) network, cognitive (CNG) network, frontal (FRNT) network, salience (SAL) network, subcortical (SUB) network, and sensorimotor (SM) network associated with hand and mouth movements. Remarkably, the MGIN model achieves superior sex classification accuracy when juxtaposed with other state-of-the-art algorithms, yielding a noteworthy 81.67% improvement in classification accuracy. Conclusion: Our model’s superiority emanates from its capacity to consolidate data from multiple scans of subjects within a proven interpretable framework. Beyond its classification prowess, our model guides our comprehension of neurodevelopment during adolescence by identifying critical subnetworks of functional connectivity. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

► Show Figures

Figure 1

23 pages, 5723 KiB

Open AccessArticle

Layer-Weighted Attention and Ascending Feature Selection: An Approach for Seriousness Level Prediction Using the FDA Adverse Event Reporting System

by Bader Aldughayfiq, Hisham Allahem, Ayman Mohamed Mostafa, Mohammed Alnusayri and Mohamed Ezz

Appl. Sci. 2024, 14(8), 3280; https://doi.org/10.3390/app14083280 - 13 Apr 2024

Viewed by 961

Abstract

In this study, we introduce a novel combination of layer-static-weighted attention and ascending feature selection techniques to predict the seriousness level of adverse drug events using the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). We utilized natural language processing (NLP) [...] Read more.

In this study, we introduce a novel combination of layer-static-weighted attention and ascending feature selection techniques to predict the seriousness level of adverse drug events using the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). We utilized natural language processing (NLP) to analyze the terms in the active substance field, in addition to considering demographic and event information such as patient sex, healthcare provider qualification, and drug characterization. Our ascending feature selection method, which progressively incorporates additional features based on their importance, demonstrated continuous enhancements in prediction performance. Simultaneously, we employed a layer-static-weighted attention technique, which dynamically adjusts the model’s focus between natural language processing (NLP) and demographic features. This technique achieved its best performance at a balanced weight of 50%, yielding an average test accuracy of 74.56% and CV ROC score of 0.83 when 4000 features were included, indicating a compelling advantage to include a larger volume of meaningful features. By integrating these methodologies, we constructed a robust model capable of effectively predicting seriousness levels, offering significant potential for improving pharmacovigilance and enhancing drug safety monitoring. The results underscore the value of NLP and demographic data in predicting drug event seriousness and demonstrate the effectiveness of our combined techniques. We encourage further research to refine these methods and evaluate their application to other clinical datasets. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

► Show Figures

Figure 1

19 pages, 1919 KiB

Open AccessArticle

Applying a Recurrent Neural Network-Based Deep Learning Model for Gene Expression Data Classification

by Sergii Babichev, Igor Liakh and Irina Kalinina

Appl. Sci. 2023, 13(21), 11823; https://doi.org/10.3390/app132111823 - 29 Oct 2023

Cited by 3 | Viewed by 2769

Abstract

The importance of gene expression data processing in solving the classification task is determined by its ability to discern intricate patterns and relationships within genetic information, enabling the precise categorization and understanding of various gene expression profiles and their consequential impacts on biological [...] Read more.

The importance of gene expression data processing in solving the classification task is determined by its ability to discern intricate patterns and relationships within genetic information, enabling the precise categorization and understanding of various gene expression profiles and their consequential impacts on biological processes and traits. In this study, we investigated various architectures and types of recurrent neural networks focusing on gene expression data. The effectiveness of the appropriate model was evaluated using various classification quality criteria based on type 1 and type 2 errors. Moreover, we calculated the integrated F1-score index using the Harrington desirability method, the value of which allowed us to improve the objectivity of the decision making when model effectiveness was evaluated. The final decision regarding model effectiveness was made based on a comprehensive classification quality criterion, which was calculated as the weighted sum of classification accuracy, integrated F1-score index, and loss function values. The simulation results show higher appeal of a single-layer GRU recurrent network with 75 neurons in the recurrent layer. We also compared convolutional and recurrent neural networks on gene expression data classification. Although convolutional neural networks showcase benefits in terms of loss function value and training time, a comparative analysis revealed that in terms of classification accuracy calculated on the test data subset, the GRU neural network model is slightly better than the CNN and LSTM models. The classification accuracy when using the GRU network was 97.2%; in other cases, it was 97.1%. In the first case, 954 out of 981 objects were correctly identified. In other cases, 952 objects were correctly identified. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

► Show Figures

Figure 1

13 pages, 669 KiB

Open AccessArticle

Detection of Unknown Polymorphic Patterns Using Feature-Extracting Part of a Convolutional Autoencoder

by Przemysław Kucharski and Krzysztof Ślot

Appl. Sci. 2023, 13(19), 10842; https://doi.org/10.3390/app131910842 - 29 Sep 2023

Viewed by 895

Abstract

Background: The present paper proposes a novel approach for detecting the presence of unknown polymorphic patterns in random symbol sequences that also comprise already known polymorphic patterns. Methods: We propose to represent rules that define the considered patterns as regular expressions and show [...] Read more.

Background: The present paper proposes a novel approach for detecting the presence of unknown polymorphic patterns in random symbol sequences that also comprise already known polymorphic patterns. Methods: We propose to represent rules that define the considered patterns as regular expressions and show how these expressions can be modeled using filter cascades of neural convolutional layers. We adopted a convolutional autoencoder (CAE) as a pattern detection framework. To detect unknown patterns, we first incorporated knowledge of known rules into the CAE’s convolutional feature extractor by fixing weights in some of its filter cascades. Then, we executed the learning procedure, where the weights of the remaining filters were driven by two different objectives. The first was to ensure correct sequence reconstruction, whereas the second was to prevent weights from learning the already known patterns. Results: The proposed methodology was tested on sample sequences derived from the human genome. The analysis of the experimental results provided statistically significant information on the presence or absence of polymorphic patterns that were not known in advance. Conclusions: The proposed method was able to detect the existence of unknown polymorphic patterns. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

► Show Figures

Figure 1

12 pages, 4620 KiB

Open AccessArticle

Automated Clinical Impression Generation for Medical Signal Data Searches

by Woonghee Lee, Jaewoo Yang, Doyeong Park and Younghoon Kim

Appl. Sci. 2023, 13(15), 8931; https://doi.org/10.3390/app13158931 - 3 Aug 2023

Viewed by 1371

Abstract

Medical retrieval systems have become significantly important in clinical settings. However, commercial retrieval systems that heavily rely on term-based indexing face challenges when handling continuous medical data, such as electroencephalography data, primarily due to the high cost associated with utilizing neurologist analyses. With [...] Read more.

Medical retrieval systems have become significantly important in clinical settings. However, commercial retrieval systems that heavily rely on term-based indexing face challenges when handling continuous medical data, such as electroencephalography data, primarily due to the high cost associated with utilizing neurologist analyses. With the increasing affordability of data recording systems, it becomes increasingly crucial to address these challenges. Traditional procedures for annotating, classifying, and interpreting medical data are costly, time consuming, and demand specialized knowledge. While cross-modal retrieval systems have been proposed to address these challenges, most concentrate on images and text, sidelining time-series medical data like electroencephalography data. As the interpretation of electroencephalography signals, which document brain activity, requires a neurologist’s expertise, this process is often the most expensive component. Therefore, a retrieval system capable of using text to identify relevant signals, eliminating the need for expert analysis, is desirable. Our research proposes a solution to facilitate the creation of indexing systems employing electroencephalography signals for report generation in situations where reports are pending a neurologist review. We introduce a method incorporating a convolutional-neural-network-based encoder from DeepSleepNet, which extracts features from electroencephalography signals, coupled with a transformer which learns the signal’s auto-correlation and the relationship between the signal and the corresponding report. Experimental evaluation using real-world data revealed our approach surpasses baseline methods. These findings suggest potential advancements in medical data retrieval and a decrease in reliance on expert knowledge for electroencephalography signal analysis. As such, our research represents a significant stride towards making electroencephalography data more comprehensible and utilizable in clinical environments. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

► Show Figures

Figure 1

19 pages, 1618 KiB

Open AccessArticle

A Hybrid Model of Cancer Diseases Diagnosis Based on Gene Expression Data with Joint Use of Data Mining Methods and Machine Learning Techniques

by Sergii Babichev, Lyudmyla Yasinska-Damri and Igor Liakh

Appl. Sci. 2023, 13(10), 6022; https://doi.org/10.3390/app13106022 - 14 May 2023

Cited by 7 | Viewed by 2019

Abstract

One of the current focuses of modern bioinformatics is the development of hybrid models to process gene expression data, in order to create diagnostic systems for various diseases. In this study, we propose a solution to this problem that combines an inductive spectral [...] Read more.

One of the current focuses of modern bioinformatics is the development of hybrid models to process gene expression data, in order to create diagnostic systems for various diseases. In this study, we propose a solution to this problem that combines an inductive spectral clustering algorithm, random forest classifier, convolutional neural network, and alternative voting method for making the final decision about patient condition. In the first stage, we apply the spectral clustering algorithm to gene expression profiles using inductive methods of objective clustering, with the calculation of internal, external, and balance clustering quality criteria. This results in clusters of mutually correlated and differently expressed gene expression profiles. In the second stage, we apply the random forest classifier and convolutional neural network to identify the examined objects, containing as attributes the gene expression values in the allocated clusters. The presented research solves both binary- and multi-classification tasks. The final decision about the patient’s condition is made using the alternative voting method, considering the classification results based on the gene expression data in various clusters. The simulation results showed that the proposed technique was highly effective, achieving a high accuracy in object identification when both classifiers were used. However, the convolutional neural network had a significantly higher data processing efficiency than the random forest algorithm, due to its substantially shorter processing time. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

► Show Figures

Figure 1

11 pages, 5419 KiB

Open AccessArticle

Web Interface of NER and RE with BERT for Biomedical Text Mining

by Yeon-Ji Park, Min-a Lee, Geun-Je Yang, Soo Jun Park and Chae-Bong Sohn

Appl. Sci. 2023, 13(8), 5163; https://doi.org/10.3390/app13085163 - 21 Apr 2023

Cited by 2 | Viewed by 2660

Abstract

The BioBERT Named Entity Recognition (NER) model is a high-performance model designed to identify both known and unknown entities. It surpasses previous NER models utilized by text-mining tools, such as tmTool and ezTag, in effectively discovering novel entities. In previous studies, the Biomedical [...] Read more.

The BioBERT Named Entity Recognition (NER) model is a high-performance model designed to identify both known and unknown entities. It surpasses previous NER models utilized by text-mining tools, such as tmTool and ezTag, in effectively discovering novel entities. In previous studies, the Biomedical Entity Recognition and Multi-Type Normalization Tool (BERN) employed this model to identify words that represent specific names, discern the type of the word, and implement it on a web page to offer NER service. However, we aimed to offer a web service that includes Relation Extraction (RE), a task determining the relation between entity pairs within a sentence. First, just like BERN, we fine-tuned the BioBERT NER model within the biomedical domain to recognize new entities. We identified two categories: diseases and genes/proteins. Additionally, we fine-tuned the BioBERT RE model to determine the presence or absence of a relation between the identified gene–disease entity pairs. The NER and RE results are displayed on a web page using the Django web framework. NER results are presented in distinct colors, and RE results are visualized as graphs in NetworkX and Cytoscape, allowing users to interact with the graphs. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

► Show Figures

Figure 1

10 pages, 557 KiB

Open AccessArticle

Order-Preserving Multiple Pattern Matching in Parallel

by Somin Park, Jinhyeok Park, Youngho Kim and Jeong Seop Sim

Appl. Sci. 2023, 13(8), 5142; https://doi.org/10.3390/app13085142 - 20 Apr 2023

Cited by 1 | Viewed by 1269

Abstract

The order-preserving multiple pattern matching problem is to find all substrings of T whose relative orders are the same for any pattern in a set of patterns. Various sequential algorithms have been studied for the order-preserving multiple pattern matching problems. In this paper, [...] Read more.

The order-preserving multiple pattern matching problem is to find all substrings of T whose relative orders are the same for any pattern in a set of patterns. Various sequential algorithms have been studied for the order-preserving multiple pattern matching problems. In this paper, we propose two parallel algorithms, each of which uses Aho–Corasick automata and fingerprint tables, respectively. We also present experimental results of comparing the execution times of each parallel algorithm on various types of time-series data. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

► Show Figures

Figure 1

18 pages, 2664 KiB

Open AccessArticle

RNA Sequences-Based Diagnosis of Parkinson’s Disease Using Various Feature Selection Methods and Machine Learning

by Jingeun Kim, Hye-Jin Park and Yourim Yoon

Appl. Sci. 2023, 13(4), 2698; https://doi.org/10.3390/app13042698 - 20 Feb 2023

Cited by 2 | Viewed by 3029

Abstract

Parkinson’s disease is a neurodegenerative disease that is associated with genetic and environmental factors. However, the genes causing this degeneration have not been determined, and no reported cure exists for this disease. Recently, studies have been conducted to classify diseases with RNA-seq data [...] Read more.

Parkinson’s disease is a neurodegenerative disease that is associated with genetic and environmental factors. However, the genes causing this degeneration have not been determined, and no reported cure exists for this disease. Recently, studies have been conducted to classify diseases with RNA-seq data using machine learning, and accurate diagnosis of diseases using machine learning is becoming an important task. In this study, we focus on how various feature selection methods can improve the performance of machine learning for accurate diagnosis of Parkinson’s disease. In addition, we analyzed the performance metrics and computational costs of running the model with and without various feature selection methods. Experiments were conducted using RNA sequencing—a technique that analyzes the transcription profiling of organisms using next-generation sequencing. Genetic algorithms (GA), information gain (IG), and wolf search algorithm (WSA) were employed as feature selection methods. Machine learning algorithms—extreme gradient boosting (XGBoost), deep neural network (DNN), support vector machine (SVM), and decision tree (DT)—were used as classifiers. Further, the model was evaluated using performance indicators, such as accuracy, precision, recall,

F_{1}

score, and receiver operating characteristic (ROC) curve. For XGBoost and DNN, feature selection methods based on GA, IG, and WSA improved the performance of machine learning by 10.00% and 38.18%, respectively. For SVM and DT, performance was improved by 0.91% and 7.27%, respectively, with feature selection methods based on IG and WSA. The results demonstrate that various feature selection methods improve the performance of machine learning when classifying Parkinson’s disease using RNA-seq data. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

► Show Figures

Figure 1

15 pages, 1714 KiB

Open AccessArticle

A Wrapped Approach Using Unlabeled Data for Diabetic Retinopathy Diagnosis

by Xuefeng Zhang, Youngsung Kim, Young-Chul Chung, Sangcheol Yoon, Sang-Yong Rhee and Yong Soo Kim

Appl. Sci. 2023, 13(3), 1901; https://doi.org/10.3390/app13031901 - 1 Feb 2023

Cited by 2 | Viewed by 1492

Abstract

Large-scale datasets, which have sufficient and identical quantities of data in each class, are the main factor in the success of deep-learning-based classification models for vision tasks. A shortage of sufficient data and interclass imbalanced data distribution, which often arise in the medical [...] Read more.

Large-scale datasets, which have sufficient and identical quantities of data in each class, are the main factor in the success of deep-learning-based classification models for vision tasks. A shortage of sufficient data and interclass imbalanced data distribution, which often arise in the medical domain, cause modern deep neural networks to suffer greatly from imbalanced learning and overfitting. A diagnostic model of diabetic retinopathy (DR) that is trained from such a dataset using supervised learning is severely biased toward the majority class. To enhance the efficiency of imbalanced learning, the proposal of this study is to leverage retinal fundus images without human annotations by self-supervised or semi-supervised learning. The proposed approach to DR detection is to add an auxiliary procedure to the target task that identifies DR using supervised learning. The added process uses unlabeled data to pre-train the model that first learns features from data using self-supervised or semi-supervised learning, and then the pre-trained model is transferred with the learned parameter to the target model. This wrapper algorithm of learning from unlabeled data can help the model gain more information from samples in the minority class, thereby improving imbalanced learning to some extent. Comprehensive experiments demonstrate that the model trained with the proposed method outperformed the one trained with only the supervised learning baseline utilizing the same data, with an accuracy improvement of 4~5%. To further examine the method proposed in this study, a comparison is conducted, and our results show that the proposed method also performs much better than some state-of-the-art methods. In the case of EyePaCS, for example, the proposed method outperforms the customized CNN model by 9%. Through experiments, we further find that the models trained with a smaller but balanced dataset are not worse than those trained with a larger but imbalanced dataset. Therefore, our study reveals that utilizing unlabeled data can avoid the expensive cost of collecting and labeling large-scale medical datasets. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

► Show Figures

Figure 1

13 pages, 4331 KiB

Open AccessArticle

Biomedical Text NER Tagging Tool with Web Interface for Generating BERT-Based Fine-Tuning Dataset

by Yeon-Ji Park, Min-a Lee, Geun-Je Yang, Soo Jun Park and Chae-Bong Sohn

Appl. Sci. 2022, 12(23), 12012; https://doi.org/10.3390/app122312012 - 24 Nov 2022

Cited by 2 | Viewed by 2960

Abstract

In this paper, a tagging tool is developed to streamline the process of locating tags for each term and manually selecting the target term. It directly extracts the terms to be tagged from sentences and displays it to the user. It also increases [...] Read more.

In this paper, a tagging tool is developed to streamline the process of locating tags for each term and manually selecting the target term. It directly extracts the terms to be tagged from sentences and displays it to the user. It also increases tagging efficiency by allowing users to reflect candidate categories in untagged terms. It is based on annotations automatically generated using machine learning. Subsequently, this architecture is fine-tuned using Bidirectional Encoder Representations from Transformers (BERT) to enable the tagging of terms that cannot be captured using Named-Entity Recognition (NER). The tagged text data extracted using the proposed tagging tool can be used as an additional training dataset. The tagging tool, which receives and saves new NE annotation input online, is added to the NER and RE web interfaces using BERT. Annotation information downloaded by the user includes the category (e.g., diseases, genes/proteins) and the list of words associated to the named entity selected by the user. The results reveal that the RE and NER results are improved using the proposed web service by collecting more NE annotation data and fine-tuning the model using generated datasets. Our application programming interfaces and demonstrations are available to the public at via the website link provided in this paper. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

► Show Figures

Figure 1

23 pages, 2229 KiB

Open AccessArticle

Hybrid Inductive Model of Differentially and Co-Expressed Gene Expression Profile Extraction Based on the Joint Use of Clustering Technique and Convolutional Neural Network

by Sergii Babichev, Lyudmyla Yasinska-Damri, Igor Liakh and Jiří Škvor

Appl. Sci. 2022, 12(22), 11795; https://doi.org/10.3390/app122211795 - 20 Nov 2022

Cited by 6 | Viewed by 1399

Abstract

The development of hybrid models focused on gene expression data processing for the allocation of differentially expressed and mutually correlated genes is one of the current directions in modern bioinformatics. The solution to this problem can allow us to improve the effectiveness of [...] Read more.

The development of hybrid models focused on gene expression data processing for the allocation of differentially expressed and mutually correlated genes is one of the current directions in modern bioinformatics. The solution to this problem can allow us to improve the effectiveness of existing systems for complex diseases diagnosis based on gene expression data analysis on the one hand and increase the efficiency of gene regulatory network reconstruction procedures by more careful selection of genes by considering the type of disease on the other hand. In this research, we propose a stepwise procedure to form the subsets of mutually correlated and differentially expressed gene expression profiles (GEP). Firstly, we allocate an informative GEP in terms of statistical and entropy criteria using the Harrington desirability function. Then, we performed cluster analysis using SOTA and spectral clustering algorithms implemented within the framework of objective clustering inductive technology. The result of this step’s implementation is a set of clusters containing co- and differentially expressed GEPs. Validation of the model was performed using a one-dimensional two-layer convolutional neural network (CNN). The analysis of the simulation results has shown the high efficiency of the proposed model. The clusters of GEPs formed based on the clustering quality criteria values allowed us to identify the investigated objects with high accuracy. Moreover, the simulation results have also shown that the hybrid inductive model based on the spectral clustering algorithm is more effective in comparison with the use of the SOTA clustering algorithm in terms of both the complexity of the formed optimal cluster structure and the classification accuracy of the objects that contain the allocated gene expression data as attributes. The proposed hybrid inductive model contributes to increasing objectivity during the formation of the subsets of differentially and co-expressed gene expression profiles for further their application in various disease diagnosis systems and for gene regulatory network reconstruction. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Applications of Artificial Intelligence in Biomedical Data Analysis

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (14 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI