Current Research in Biostatistics

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematical Biology".

Deadline for manuscript submissions: closed (30 June 2024) | Viewed by 18592

Special Issue Editors


E-Mail Website
Guest Editor
Department of Statistics and O.I., Faculty of Medicine, University of Granada, 18016 Granada, Spain
Interests: inference in diagnostic models; biostatistics; statistic elearning; data science; medical statistics
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Biostatistic, University of Granada, 18071 Granada, Spain
Interests: predictive models; inference in diagnostic models; biostatistics; scale validation; teaching biostatistics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Biostatistics involves the application of statistics in the biomedical and health sciences. Statistics provides rigorous methodology with which to address typical medical or health problems.

This Special Issue aims to present the developing applications of biostatistics in various areas. Such areas include, but are not limited to:

  • Diagnostic or prognostic models.
  • Causal inference.
  • Analysis of large databases: data fusion.
  • Missing data.
  • Teaching statistics in the health sciences.
  • Stochastic models in biology and health sciences.
  • Spatio-temporal distribution of disease.
  • Measures of association in 2x2 tables.
  • Development of computer packages for biostatistics.
  • Experimental designs.
  • Controlled clinical trails.
  • Information fusion.
  • Combination of observational databases.
  • Estimation of prevalence and incidence under different sampling schemes.

All articles should present a problem statement, the methodology used to solve it and a resolution.

Dr. Miguel Ángel Montero-Alonso
Prof. Dr. Juan De Dios Luna del Castillo
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • diagnostic models
  • causal inference
  • computational biostatistics
  • measures of association
  • prognostic models

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 362 KiB  
Article
Homogeneity Test of Ratios of Two Proportions in Stratified Bilateral and Unilateral Data
by Huipei Wang and Chang-Xing Ma
Mathematics 2024, 12(17), 2719; https://doi.org/10.3390/math12172719 - 30 Aug 2024
Viewed by 671
Abstract
In paired-organ studies such as ophthalmology, otolaryngology, and rheumatology, etc., various approaches take highly correlated bilateral data into account for homogeneity tests but are less likely to focus on combined bilateral and unilateral data structures. Also, it is necessary and important to adjust [...] Read more.
In paired-organ studies such as ophthalmology, otolaryngology, and rheumatology, etc., various approaches take highly correlated bilateral data into account for homogeneity tests but are less likely to focus on combined bilateral and unilateral data structures. Also, it is necessary and important to adjust the effect of confounders on stratified combined bilateral and unilateral data since, in these data structures, ignoring intra-class correlation and confounding effects can cause biased statistical inference. This article derived three homogeneity tests (the likelihood ratio test, the Wald test, and the score test) concerning these cooperative structure data to detect if ratios of proportions retain consistency across strata. Simulation shows that the score test provides a robust Type I error rate and satisfactory power performance. Finally, a real example is applied to demonstrate the application of these three proposed tests. Full article
(This article belongs to the Special Issue Current Research in Biostatistics)
Show Figures

Figure 1

30 pages, 9253 KiB  
Article
Bayesian Deep Learning and Bayesian Statistics to Analyze the European Countries’ SARS-CoV-2 Policies
by Hamed Khalili, Maria A. Wimmer and Ulf Lotzmann
Mathematics 2024, 12(16), 2574; https://doi.org/10.3390/math12162574 - 20 Aug 2024
Viewed by 793
Abstract
Even if the SARS-CoV-2 pandemic recedes, research regarding the effectiveness of government policies to contain the spread of the pandemic remains important. In this study, we analyze the impact of a set of epidemiological factors on the spread of SARS-CoV-2 in 30 European [...] Read more.
Even if the SARS-CoV-2 pandemic recedes, research regarding the effectiveness of government policies to contain the spread of the pandemic remains important. In this study, we analyze the impact of a set of epidemiological factors on the spread of SARS-CoV-2 in 30 European countries, which were applied from early 2020 up to mid-2022. We combine four data sets encompassing each country’s non-pharmaceutical interventions (NPIs, including 66 government intervention types), distributions of 31 virus types, and accumulated percentage of vaccinated population (by the first five doses) as well as the reported infections, each on a daily basis. First, a Bayesian deep learning model is trained to predict the reproduction rate of the virus one month ahead of each day. Based on the trained deep learning model, the importance of relevant influencing factors and the magnitude of their effects on the outcome of the neural network model are computed by applying explainable machine learning algorithms. Second, in order to re-examine the results of the deep learning model, a Bayesian statistical analysis is implemented. In the statistical analysis, for each influencing input factor in each country, the distributions of pandemic growth rates are compared for days where the factor was active with days where the same factor was not active. The results of the deep learning model and the results of the statistical inference model coincide to a significant extent. We conclude with reflections with regard to the most influential factors on SARS-CoV-2 spread within European countries. Full article
(This article belongs to the Special Issue Current Research in Biostatistics)
Show Figures

Figure 1

15 pages, 5634 KiB  
Article
Homogeneous Ensemble Feature Selection for Mass Spectrometry Data Prediction in Cancer Studies
by Yulan Liang, Amin Gharipour, Erik Kelemen and Arpad Kelemen
Mathematics 2024, 12(13), 2085; https://doi.org/10.3390/math12132085 - 3 Jul 2024
Viewed by 818
Abstract
The identification of important proteins is critical for the medical diagnosis and prognosis of common diseases. Diverse sets of computational tools have been developed for omics data reduction and protein selection. However, standard statistical models with single-feature selection involve the multi-testing burden of [...] Read more.
The identification of important proteins is critical for the medical diagnosis and prognosis of common diseases. Diverse sets of computational tools have been developed for omics data reduction and protein selection. However, standard statistical models with single-feature selection involve the multi-testing burden of low power with limited available samples. Furthermore, high correlations among proteins with high redundancy and moderate effects often lead to unstable selections and cause reproducibility issues. Ensemble feature selection in machine learning (ML) may identify a stable set of disease biomarkers that could improve the prediction performance of subsequent classification models and thereby simplify their interpretability. In this study, we developed a three-stage homogeneous ensemble feature selection (HEFS) approach for both identifying proteins and improving prediction accuracy. This approach was implemented and applied to ovarian cancer proteogenomics datasets comprising (1) binary putative homologous recombination deficiency (HRD)- positive or -negative samples; (2) multiple mRNA classes (differentiated, proliferative, immunoreactive, mesenchymal, and unknown samples). We conducted and compared various ML methods with HEFS including random forest (RF), support vector machine (SVM), and neural network (NN) for predicting both binary and multiple-class outcomes. The results indicated that the prediction accuracies varied for both binary and multiple-class classifications using various ML approaches with the proposed HEFS method. RF and NN provided better prediction accuracies than simple Naive Bayes or logistic models. For binary outcomes, with a sample size of 122 and nine selected prediction proteins using our proposed three-stage HEFS approach, the best ensemble ML (Treebag) achieved 83% accuracy, 85% sensitivity, and 81% specificity. For multiple (five)-class outcomes, the proposed HEFS-selected proteins combined with Principal Component Analysis (PCA) in NN resulted in prediction accuracies for multiple-class classifications ranging from 75% to 96% for each of the five classes. Despite the different prediction accuracies of the various models, HEFS identified consistent sets of proteins linked to the binary and multiple-class outcomes. Full article
(This article belongs to the Special Issue Current Research in Biostatistics)
Show Figures

Figure 1

23 pages, 9379 KiB  
Article
Comparison of Feature Selection Methods—Modelling COPD Outcomes
by Jorge Cabral, Pedro Macedo, Alda Marques and Vera Afreixo
Mathematics 2024, 12(9), 1398; https://doi.org/10.3390/math12091398 - 3 May 2024
Viewed by 982
Abstract
Selecting features associated with patient-centered outcomes is of major relevance yet the importance given depends on the method. We aimed to compare stepwise selection, least absolute shrinkage and selection operator, random forest, Boruta, extreme gradient boosting and generalized maximum entropy estimation and suggest [...] Read more.
Selecting features associated with patient-centered outcomes is of major relevance yet the importance given depends on the method. We aimed to compare stepwise selection, least absolute shrinkage and selection operator, random forest, Boruta, extreme gradient boosting and generalized maximum entropy estimation and suggest an aggregated evaluation. We also aimed to describe outcomes in people with chronic obstructive pulmonary disease (COPD). Data from 42 patients were collected at baseline and at 5 months. Acute exacerbations were the aggregated most important feature in predicting the difference in the handgrip muscle strength (dHMS) and the COVID-19 lockdown group had an increased dHMS of 3.08 kg (CI95 ≈ [0.04, 6.11]). Pack-years achieved the highest importance in predicting the difference in the one-minute sit-to-stand test and no clinical change during lockdown was detected. Charlson comorbidity index was the most important feature in predicting the difference in the COPD assessment test (dCAT) and participants with severe values are expected to have a decreased dCAT of 6.51 points (CI95 ≈ [2.52, 10.50]). Feature selection methods yield inconsistent results, particularly extreme gradient boosting and random forest with the remaining. Models with features ordered by median importance had a meaningful clinical interpretation. Lockdown seem to have had a negative impact in the upper-limb muscle strength. Full article
(This article belongs to the Special Issue Current Research in Biostatistics)
Show Figures

Figure 1

17 pages, 440 KiB  
Article
Statistical Considerations for Analyzing Data Derived from Long Longitudinal Cohort Studies
by Rocío Fernández-Iglesias, Pablo Martínez-Camblor, Adonina Tardón and Ana Fernández-Somoano
Mathematics 2023, 11(19), 4070; https://doi.org/10.3390/math11194070 - 25 Sep 2023
Viewed by 1012
Abstract
Modern science is frequently based on the exploitation of large volumes of information storage in datasets and involving complex computational architectures. The statistical analyses of these datasets have to cope with specific challenges and frequently involve making informed but arbitrary decisions. Epidemiological papers [...] Read more.
Modern science is frequently based on the exploitation of large volumes of information storage in datasets and involving complex computational architectures. The statistical analyses of these datasets have to cope with specific challenges and frequently involve making informed but arbitrary decisions. Epidemiological papers have to be concise and focused on the underlying clinical or epidemiological results, not reporting the details behind relevant methodological decisions. In this work, we used an analysis of the cardiovascular-related measures tracked in 4–8-year-old children, using data from the INMA-Asturias cohort for illustrating how the decision-making process was performed and its potential impact on the obtained results. We focused on two particular aspects of the problem: how to deal with missing data and which regression model to use to evaluate tracking when there are no defined thresholds to categorize variables into risk groups. As a spoiler, we analyzed the impact on our results of using multiple imputation and the advantage of using quantile regression models in this context. Full article
(This article belongs to the Special Issue Current Research in Biostatistics)
Show Figures

Figure 1

10 pages, 841 KiB  
Article
Subgroup Identification in Survival Outcome Data Based on Concordance Probability Measurement
by Shengli An, Peter Zhang and Hong-Bin Fang
Mathematics 2023, 11(13), 2855; https://doi.org/10.3390/math11132855 - 26 Jun 2023
Cited by 1 | Viewed by 1232
Abstract
Identifying a subgroup of patients who may have an enhanced treatment effect in a randomized clinical trial has received increasing attention recently. For time-to-event outcomes, it is a challenge to define the effectiveness of a treatment and to choose a cutoff time point [...] Read more.
Identifying a subgroup of patients who may have an enhanced treatment effect in a randomized clinical trial has received increasing attention recently. For time-to-event outcomes, it is a challenge to define the effectiveness of a treatment and to choose a cutoff time point for identifying subgroup membership, especially in trials in which the two treatment arms do not differ in overall survival. In this paper, we propose a mixture cure model to identify a subgroup for a new treatment that was compared to a classical treatment (or placebo) in a randomized clinical trial with respect to survival time. Using the concordance probability measurement (K-index), we propose a statistic to test the existence of subgroups with effective treatments in the treatment arm. Subsequently, the subgroup is defined by a limited number of covariates based on the estimated area under the curve (AUC). The performance of this method in different scenarios is assessed through simulation studies. A real data example is also provided for illustration. Full article
(This article belongs to the Special Issue Current Research in Biostatistics)
Show Figures

Figure 1

11 pages, 595 KiB  
Article
Scrambling Reports: New Estimators for Estimating the Population Mean of Sensitive Variables
by Pablo O. Juárez-Moreno, Agustín Santiago-Moreno, José M. Sautto-Vallejo and Carlos N. Bouza-Herrera
Mathematics 2023, 11(11), 2572; https://doi.org/10.3390/math11112572 - 4 Jun 2023
Cited by 1 | Viewed by 1247
Abstract
Warner proposed a methodology called randomized response techniques, which, through the random scrambling of sensitive variables, allows the non-response rate to be reduced and the response bias to be diminished. In this document, we present a randomized response technique using simple random sampling. [...] Read more.
Warner proposed a methodology called randomized response techniques, which, through the random scrambling of sensitive variables, allows the non-response rate to be reduced and the response bias to be diminished. In this document, we present a randomized response technique using simple random sampling. The scrambling of the sensitive variable is performed through the selection of a report Ri, i = 1,2,3. In order to evaluate the accuracy and efficiency of the proposed estimators, a simulation was carried out with two databases, where the sensitive variables are the destruction of poppy crops in Guerrero, Mexico, and the age at first sexual intercourse. The results show that more accurate estimates are obtained with the proposed model. Full article
(This article belongs to the Special Issue Current Research in Biostatistics)
Show Figures

Figure 1

18 pages, 857 KiB  
Article
Diagnosing Vascular Aging Based on Macro and Micronutrients Using Ensemble Machine Learning
by Carmen Patino-Alonso, Marta Gómez-Sánchez, Leticia Gómez-Sánchez, Emiliano Rodríguez-Sánchez, Cristina Agudo-Conde, Luis García-Ortiz and Manuel A Gómez-Marcos
Mathematics 2023, 11(7), 1645; https://doi.org/10.3390/math11071645 - 29 Mar 2023
Cited by 1 | Viewed by 1679
Abstract
The influence of dietary components on vascular dysfunction and aging is unclear. This study therefore aims to propose a model to predict the influence of macro and micronutrients on accelerated vascular aging in a Spanish population without previous cardiovascular disease. This cross-sectional study [...] Read more.
The influence of dietary components on vascular dysfunction and aging is unclear. This study therefore aims to propose a model to predict the influence of macro and micronutrients on accelerated vascular aging in a Spanish population without previous cardiovascular disease. This cross-sectional study involved a total of 501 individuals aged between 35 and 75 years. Carotid-femoral pulse wave velocity (cfPWV) was measured using a Sphygmo Cor® device. Carotid intima-media thickness (IMTc) was measured using a Sonosite Micromax® ultrasound machine. The Vascular Aging Index (VAI) was estimated according to VAI = (LN (1.09) × 10 cIMT + LN (1.14) × cfPWV) 39.1 + 4.76. Vascular aging was defined considering the presence of a vascular lesion and the p75 by age and sex of VAI following two steps: Step 1: subjects were labelled as early vascular aging (EVA) if they had a peripheral arterial disease or carotid artery lesion. Step 2: they were classified as EVA if the VAI value was >p75 and as normal vascular aging (NVA) if it was ≤p75. To predict the model, we used machine learning algorithms to analyse the association between macro and micronutrients and vascular aging. In this article, we proposed the AdXGRA model, a stacked ensemble learning model for diagnosing vascular aging from macro and micronutrients. The proposed model uses four classifiers, AdaBoost (ADB), extreme gradient boosting (XGB), generalized linear model (GLM), and random forest (RF) at the first level, and then combines their predictions by using a second-level multilayer perceptron (MLP) classifier to achieve better performance. The model obtained an accuracy of 68.75% in prediction, with a sensitivity of 66.67% and a specificity of 68.79%. The seven main variables related to EVA in the proposed model were sodium, waist circumference, polyunsaturated fatty acids (PUFA), monounsaturated fatty acids (MUFA), total protein, calcium, and potassium. These results suggest that total protein, PUFA, and MUFA are the macronutrients, and calcium and potassium are the micronutrients related to EVA. Full article
(This article belongs to the Special Issue Current Research in Biostatistics)
Show Figures

Figure 1

18 pages, 4490 KiB  
Article
DeepLabv3+-Based Segmentation and Best Features Selection Using Slime Mould Algorithm for Multi-Class Skin Lesion Classification
by Mehwish Zafar, Javeria Amin, Muhammad Sharif, Muhammad Almas Anjum, Ghulam Ali Mallah and Seifedine Kadry
Mathematics 2023, 11(2), 364; https://doi.org/10.3390/math11020364 - 10 Jan 2023
Cited by 25 | Viewed by 3347
Abstract
The development of abnormal cell growth is caused by different pathological alterations and some genetic disorders. This alteration in skin cells is very dangerous and life-threatening, and its timely identification is very essential for better treatment and safe cure. Therefore, in the present [...] Read more.
The development of abnormal cell growth is caused by different pathological alterations and some genetic disorders. This alteration in skin cells is very dangerous and life-threatening, and its timely identification is very essential for better treatment and safe cure. Therefore, in the present article, an approach is proposed for skin lesions’ segmentation and classification. So, in the proposed segmentation framework, pre-trained Mobilenetv2 is utilised in the act of the back pillar of the DeepLabv3+ model and trained on the optimum parameters that provide significant improvement for infected skin lesions’ segmentation. The multi-classification of the skin lesions is carried out through feature extraction from pre-trained DesneNet201 with N × 1000 dimension, out of which informative features are picked from the Slim Mould Algorithm (SMA) and input to SVM and KNN classifiers. The proposed method provided a mean ROC of 0.95 ± 0.03 on MED-Node, 0.97 ± 0.04 on PH2, 0.98 ± 0.02 on HAM-10000, and 0.97 ± 0.00 on ISIC-2019 datasets. Full article
(This article belongs to the Special Issue Current Research in Biostatistics)
Show Figures

Figure 1

15 pages, 2852 KiB  
Article
DExMA: An R Package for Performing Gene Expression Meta-Analysis with Missing Genes
by Juan Antonio Villatoro-García, Jordi Martorell-Marugán, Daniel Toro-Domínguez, Yolanda Román-Montoya, Pedro Femia and Pedro Carmona-Sáez
Mathematics 2022, 10(18), 3376; https://doi.org/10.3390/math10183376 - 17 Sep 2022
Cited by 3 | Viewed by 3239
Abstract
Meta-analysis techniques allow researchers to jointly analyse different studies to determine common effects. In the field of transcriptomics, these methods have gained popularity in recent years due to the increasing number of datasets that are available in public repositories. Despite this, there is [...] Read more.
Meta-analysis techniques allow researchers to jointly analyse different studies to determine common effects. In the field of transcriptomics, these methods have gained popularity in recent years due to the increasing number of datasets that are available in public repositories. Despite this, there is a limited number of statistical software packages that implement proper meta-analysis functionalities for this type of data. This article describes DExMA, an R package that provides a set of functions for performing gene expression meta-analyses, from data downloading to results visualization. Additionally, we implemented functions to control the number of missing genes, which can be a major issue when comparing studies generated with different analytical platforms. DExMA is freely available in the Bioconductor repository. Full article
(This article belongs to the Special Issue Current Research in Biostatistics)
Show Figures

Figure 1

14 pages, 2310 KiB  
Article
Preliminary Results on the Preinduction Cervix Status by Shear Wave Elastography
by Jorge Torres, María Muñoz, María Del Carmen Porcel, Sofía Contreras, Francisca Sonia Molina, Guillermo Rus, Olga Ocón-Hernández and Juan Melchor
Mathematics 2022, 10(17), 3164; https://doi.org/10.3390/math10173164 - 2 Sep 2022
Cited by 1 | Viewed by 1820
Abstract
The mechanical status of the cervix is a key physiological element during pregnancy. By considering a successful induction when the active phase of labor is achieved, mapping the mechanical properties of the cervix could have predictive potential for the management of induction protocols. [...] Read more.
The mechanical status of the cervix is a key physiological element during pregnancy. By considering a successful induction when the active phase of labor is achieved, mapping the mechanical properties of the cervix could have predictive potential for the management of induction protocols. In this sense, we performed a preliminary assessment of the diagnostic value of using shear wave elastography before labor induction in 54 women, considering the pregnancy outcome and Cesarean indications. Three anatomical cervix regions and standard methods, such as cervical length and Bishop score, were compared. To study the discriminatory power of each diagnostic method, a receiver operating characteristic curve was generated. Differences were observed using the external os region and cervical length in the failure to enter the active phase group compared to the vaginal delivery group (p < 0.05). The area under the ROC curve resulted in 68.9%, 65.2% and 67.2% for external os, internal os and cervix box using elastography, respectively, compared to 69.5% for cervical length and 62.2% for Bishop score. External os elastography values have shown promise in predicting induction success. This a priori information could be used to prepare a study with a larger sample size, which would reduce the effect of any bias selection and increase the predictive power of elastography compared to other classical techniques. Full article
(This article belongs to the Special Issue Current Research in Biostatistics)
Show Figures

Figure 1

Back to TopTop