Statistical Data Modeling and Machine Learning with Applications, 3rd Edition

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Probability and Statistics".

Deadline for manuscript submissions: 15 December 2025 | Viewed by 13690

Special Issue Editor


E-Mail Website
Guest Editor
Department of Mathematical Analysis, Faculty of Mathematics and Informatics, University of Plovdiv Paisii Hilendarski, 24 Tzar Assen St., 4000 Plovdiv, Bulgaria
Interests: computational statistics; applied mathematics; data mining; computer modeling in physics and engineering
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Statistics and machine learning are two intertwined fields of mathematics and computer science. In recent years, very powerful classification and predictive methods have been developed in this area. As a rule, the new methods for statistical data modeling and machine learning provide enormous opportunities for the development of new methods and approaches, as well as for their use to effectively solve practical problems.

The proposed Special Issue aims to publish review papers, research articles, and communications that present new original methods, applications, data analyses, case studies, comparative studies, and other results. Special attention will be given to, but is not limited to, the theory and application of statistical data modeling and machine learning to diverse areas such as computer science, economics, industry, medicine, environmental sciences, forex and finance, education, engineering, marketing, agriculture, and more.

Prof. Dr. Snezhana Gocheva-Ilieva
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computational statistics
  • dimensionality reduction and variable selection
  • nonparametric statistical modeling
  • supervised learning (classification, regression)
  • clustering methods
  • financial statistics and econometrics
  • statistical algorithms
  • time series analysis and forecasting
  • machine learning algorithms
  • decision trees
  • ensemble methods
  • neural networks
  • deep learning
  • hybrid models
  • data analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

23 pages, 830 KiB  
Article
Analyzing the Influence of Telematics-Based Pricing Strategies on Traditional Rating Factors in Auto Insurance Rate Regulation
by Shengkun Xie
Mathematics 2024, 12(19), 3150; https://doi.org/10.3390/math12193150 - 8 Oct 2024
Viewed by 752
Abstract
This study examines how telematics variables such as annual percentage driven, total miles driven, and driving patterns influence the distributional behaviour of conventional rating factors when incorporated into predictive models for capturing auto insurance risk in rate regulation. To effectively manage the complexity [...] Read more.
This study examines how telematics variables such as annual percentage driven, total miles driven, and driving patterns influence the distributional behaviour of conventional rating factors when incorporated into predictive models for capturing auto insurance risk in rate regulation. To effectively manage the complexity inherent in telematics data, we advocate for the adoption of non-negative sparse principal component analysis (NSPCA) as a structured approach for data dimensionality reduction. By emphasizing sparsity and non-negativity constraints, NSPCA enhances the interpretability and predictive power of models concerning both loss severity and claim counts. This methodological innovation aims to advance statistical analyses within insurance pricing frameworks, ensuring the robustness of predictive models and providing insights crucial for rate regulation strategies specific to the auto insurance sector. Results show that, to enhance auto insurance risk pricing models, it is essential to address data dimension reduction challenges when integrating telematics data variables. Our findings underscore that integrating telematics variables into predictive models maintains the integrity of risk relativity estimates associated with traditional policy variables. Full article
Show Figures

Figure 1

26 pages, 2564 KiB  
Article
Multi-Task Forecasting of the Realized Volatilities of Agricultural Commodity Prices
by Rangan Gupta and Christian Pierdzioch
Mathematics 2024, 12(18), 2952; https://doi.org/10.3390/math12182952 - 23 Sep 2024
Viewed by 640
Abstract
Motivated by the comovement of realized volatilities (RVs) of agricultural commodity prices, we study whether multi-task forecasting algorithms improve the accuracy of out-of-sample forecasts of 15 agricultural commodities during the sample period from July 2015 to April 2023. We consider alternative multi-task stacking [...] Read more.
Motivated by the comovement of realized volatilities (RVs) of agricultural commodity prices, we study whether multi-task forecasting algorithms improve the accuracy of out-of-sample forecasts of 15 agricultural commodities during the sample period from July 2015 to April 2023. We consider alternative multi-task stacking algorithms and variants of the multivariate Lasso estimator. We find evidence of in-sample predictability but scarce evidence that multi-task forecasting improves out-of-sample forecasts relative to a classic univariate heterogeneous autoregressive (HAR)-RV model. This lack of systematic evidence of out-of-sample forecasting gains is corroborated by extensive robustness checks, including an in-depth study of the quantiles of the distributions of the RVs and subsample periods that account for increases in the total spillovers among the RVs. We also study an extended model that features the RVs of energy commodities and precious metals, but our conclusions remain unaffected. Besides offering important lessons for future research, our results are interesting for financial market participants, who rely on accurate forecasts of RVs when solving portfolio optimization and derivatives pricing problems, and policymakers, who need accurate forecasts of RVs when designing policies to mitigate the potential adverse effects of a rise in the RVs of agricultural commodity prices and the concomitant economic and political uncertainty. Full article
Show Figures

Figure 1

18 pages, 7996 KiB  
Article
Forecasting and Multilevel Early Warning of Wind Speed Using an Adaptive Kernel Estimator and Optimized Gated Recurrent Units
by Pengjiao Wang, Qiuliang Long, Hu Zhang, Xu Chen, Ran Yu and Fengqi Guo
Mathematics 2024, 12(16), 2581; https://doi.org/10.3390/math12162581 - 21 Aug 2024
Cited by 1 | Viewed by 587
Abstract
Accurately predicting wind speeds is of great significance in various engineering applications, such as the operation of high-speed trains. Machine learning models are effective in this field. However, existing studies generally provide deterministic predictions and utilize decomposition techniques in advance to enhance predictive [...] Read more.
Accurately predicting wind speeds is of great significance in various engineering applications, such as the operation of high-speed trains. Machine learning models are effective in this field. However, existing studies generally provide deterministic predictions and utilize decomposition techniques in advance to enhance predictive performance, which may encounter data leakage and fail to capture the stochastic nature of wind data. This work proposes an advanced framework for the prediction and early warning of wind speeds by combining the optimized gated recurrent unit (GRU) and adaptive kernel density estimator (AKDE). Firstly, 12 samples (26,280 points each) were collected from an extensive open database. Three representative metaheuristic algorithms were then employed to optimize the parameters of diverse models, including extreme learning machines, a transformer model, and recurrent networks. The results yielded an optimal selection using the GRU and the crested porcupine optimizer. Afterwards, by using the AKDE, the joint probability density and cumulative distribution function of wind predictions and related predicting errors could be obtained. It was then applicable to calculate the conditional probability that actual wind speed exceeds the critical value, thereby providing probabilistic-based predictions in a multilevel manner. A comparison of the predictive performance of various methods and accuracy of subsequent decisions validated the proposed framework. Full article
Show Figures

Figure 1

16 pages, 3803 KiB  
Article
Wind Energy Production in Italy: A Forecasting Approach Based on Fractional Brownian Motion and Generative Adversarial Networks
by Luca Di Persio, Nicola Fraccarolo and Andrea Veronese
Mathematics 2024, 12(13), 2105; https://doi.org/10.3390/math12132105 - 4 Jul 2024
Viewed by 679
Abstract
This paper focuses on developing a predictive model for wind energy production in Italy, aligning with the ambitious goals of the European Green Deal. In particular, by utilising real data from the SUD (South) Italian electricity zone over seven years, the model employs [...] Read more.
This paper focuses on developing a predictive model for wind energy production in Italy, aligning with the ambitious goals of the European Green Deal. In particular, by utilising real data from the SUD (South) Italian electricity zone over seven years, the model employs stochastic differential equations driven by (fractional) Brownian motion-based dynamic and generative adversarial networks to forecast wind energy production up to one week ahead accurately. Numerical simulations demonstrate the model’s effectiveness in capturing the complexities of wind energy prediction. Full article
Show Figures

Figure 1

24 pages, 5099 KiB  
Article
Predicting Compressive Strength of High-Performance Concrete Using Hybridization of Nature-Inspired Metaheuristic and Gradient Boosting Machine
by Nhat-Duc Hoang, Van-Duc Tran and Xuan-Linh Tran
Mathematics 2024, 12(8), 1267; https://doi.org/10.3390/math12081267 - 22 Apr 2024
Cited by 4 | Viewed by 1177
Abstract
This study proposes a novel integration of the Extreme Gradient Boosting Machine (XGBoost) and Differential Flower Pollination (DFP) for constructing an intelligent method to predict the compressive strength (CS) of high-performance concrete (HPC) mixes. The former is employed to generalize a mapping function [...] Read more.
This study proposes a novel integration of the Extreme Gradient Boosting Machine (XGBoost) and Differential Flower Pollination (DFP) for constructing an intelligent method to predict the compressive strength (CS) of high-performance concrete (HPC) mixes. The former is employed to generalize a mapping function between the mechanical property of concrete and its influencing factors. DFP, as a metaheuristic algorithm, is employed to optimize the learning phase of XGBoost and reach a fine balance between the two goals of model building: reducing the prediction error and maximizing the generalization capability. To construct the proposed method, a historical dataset consisting of 400 samples was collected from previous studies. The model’s performance is reliably assessed via multiple experiments and Wilcoxon signed-rank tests. The hybrid DFP-XGBoost is able to achieve good predictive outcomes with a root mean square error of 5.27, a mean absolute percentage error of 6.74%, and a coefficient of determination of 0.94. Additionally, quantile regression based on XGBoost is performed to construct interval predictions of the CS of HPC. Notably, an asymmetric error loss is used to diminish overestimations committed by the model. It was found that this loss function successfully reduced the percentage of overestimated CS values from 47.1% to 27.5%. Hence, DFP-XGBoost can be a promising approach for accurately and reliably estimating the CS of untested HPC mixes. Full article
Show Figures

Figure 1

15 pages, 5583 KiB  
Article
Hybrid Model of Natural Time Series with Neural Network Component and Adaptive Nonlinear Scheme: Application for Anomaly Detection
by Oksana Mandrikova and Bogdana Mandrikova
Mathematics 2024, 12(7), 1079; https://doi.org/10.3390/math12071079 - 3 Apr 2024
Cited by 2 | Viewed by 932
Abstract
It is often difficult to describe natural time series due to implicit dependences and correlated noise. During anomalous natural processes, anomalous features appear in data. They have a nonstationary structure and do not allow us to apply traditional methods for time series modeling. [...] Read more.
It is often difficult to describe natural time series due to implicit dependences and correlated noise. During anomalous natural processes, anomalous features appear in data. They have a nonstationary structure and do not allow us to apply traditional methods for time series modeling. In order to solve these problems, new models, adequately describing natural data, are required. A new hybrid model of a time series (HMTS) with a nonstationary structure is proposed in this paper. The HMTS has regular and anomalous components. The HMTS regular component is determined on the basis of an autoencoder neural network. To describe the HMTS anomalous component, an adaptive nonlinear approximating scheme (ANAS) is used on a wavelet basis. HMTS is considered in this investigation for the problem of neutron monitor data modeling and anomaly detection. Anomalies in neutron monitor data indicate negative factors in space weather. The timely detection of these factors is critically important. This investigation showed that the developed HMTS adequately describes neutron monitor data and has satisfactory results from the point of view of numeric performance. The MSE model values are close to 0 and errors are white Gaussian noise. In order to optimize the estimate of the HMTS anomalous component, the likelihood ratio test was applied. Moreover, the wavelet basis, giving the least losses during ANAS construction, was determined. Statistical modeling results showed that HMTS provides a high accuracy of anomaly detection. When the signal/noise ratio is 1.3 and anomaly durations are more than 60 counts, the probability of their detection is close to 90%. This is a high rate in the problem domain under consideration and provides solution reliability of the problem of anomaly detection in neutron monitor data. Moreover, the processing of data from several neutron monitor stations showed the high sensitivity of the HMTS. This shows the possibility to minimize the number of engaged stations, maintaining anomaly detection accuracy compared to the global survey method widely used in this field. This result is important as the continuous operation of neutron monitor stations is not always provided. Thus, the results show that the developed HMTS has the potential to address the problem of anomaly detection in neutron monitor data even when the number of operating stations is small. The proposed HMTS can help us to decrease the risks of the negative impact of space weather anomalies on human health and modern infrastructure. Full article
Show Figures

Figure 1

17 pages, 1883 KiB  
Article
Analysis of a Predictive Mathematical Model of Weather Changes Based on Neural Networks
by Boris V. Malozyomov, Nikita V. Martyushev, Svetlana N. Sorokova, Egor A. Efremenkov, Denis V. Valuev and Mengxu Qi
Mathematics 2024, 12(3), 480; https://doi.org/10.3390/math12030480 - 2 Feb 2024
Cited by 12 | Viewed by 2779
Abstract
In this paper, we investigate mathematical models of meteorological forecasting based on the work of neural networks, which allow us to calculate presumptive meteorological parameters of the desired location on the basis of previous meteorological data. A new method of grouping neural networks [...] Read more.
In this paper, we investigate mathematical models of meteorological forecasting based on the work of neural networks, which allow us to calculate presumptive meteorological parameters of the desired location on the basis of previous meteorological data. A new method of grouping neural networks to obtain a more accurate output result is proposed. An algorithm is presented, based on which the most accurate meteorological forecast was obtained based on the results of the study. This algorithm can be used in a wide range of situations, such as obtaining data for the operation of equipment in a given location and studying meteorological parameters of the location. To build this model, we used data obtained from personal weather stations of the Weather Underground company and the US National Digital Forecast Database (NDFD). Also, a Google remote learning machine was used to compare the results with existing products on the market. The algorithm for building the forecast model covered several locations across the US in order to compare its performance in different weather zones. Different methods of training the machine to produce the most effective weather forecast result were also considered. Full article
Show Figures

Figure 1

14 pages, 613 KiB  
Article
Beyond Traditional Assessment: A Fuzzy Logic-Infused Hybrid Approach to Equitable Proficiency Evaluation via Online Practice Tests
by Todorka Glushkova, Vanya Ivanova and Boyan Zlatanov
Mathematics 2024, 12(3), 371; https://doi.org/10.3390/math12030371 - 24 Jan 2024
Cited by 1 | Viewed by 983
Abstract
This article presents a hybrid approach to assessing students’ foreign language proficiency in a cyber–physical educational environment. It focuses on the advantages of the integrated assessment of student knowledge by considering the impact of automatic assessment, learners’ independent work, and their achievements to [...] Read more.
This article presents a hybrid approach to assessing students’ foreign language proficiency in a cyber–physical educational environment. It focuses on the advantages of the integrated assessment of student knowledge by considering the impact of automatic assessment, learners’ independent work, and their achievements to date. An assessment approach is described using the mathematical theory of fuzzy functions, which are employed to ensure the fair evaluation of students. The largest possible number of students whose reevaluation of test results will not affect the overall performance of the student group is automatically determined. The study also models the assessment process in the cyber–physical educational environment through the formal semantics of calculus of context-aware ambients (CCAs). Full article
Show Figures

Figure 1

13 pages, 2193 KiB  
Article
Enhanced Checkerboard Detection Using Gaussian Processes
by Michaël Hillen, Ivan De Boi, Thomas De Kerf, Seppe Sels, Edgar Cardenas De La Hoz, Jona Gladines, Gunther Steenackers, Rudi Penne and Steve Vanlanduit
Mathematics 2023, 11(22), 4568; https://doi.org/10.3390/math11224568 - 7 Nov 2023
Viewed by 1831
Abstract
Accurate checkerboard detection is of vital importance for computer vision applications, and a variety of checkerboard detectors have been developed in the past decades. While some detectors are able to handle partially occluded checkerboards, they fail when a large occlusion completely divides the [...] Read more.
Accurate checkerboard detection is of vital importance for computer vision applications, and a variety of checkerboard detectors have been developed in the past decades. While some detectors are able to handle partially occluded checkerboards, they fail when a large occlusion completely divides the checkerboard. We propose a new checkerboard detection pipeline for occluded checkerboards that has a robust performance under varying levels of noise, blurring, and distortion, and for a variety of imaging modalities. This pipeline consists of a checkerboard detector and checkerboard enhancement with Gaussian processes (GP). By learning a mapping from local board coordinates to image pixel coordinates via a Gaussian process, we can fill in occluded corners, expand the board beyond the image borders, allocate detected corners that do not fit an initial grid, and remove noise on the detected corner locations. We show that our method can improve the performance of other publicly available state-of-the-art checkerboard detectors, both in terms of accuracy and the number of corners detected. Our code and datasets are made publicly available. The checkerboard detector pipeline is contained within our Python checkerboard detection library, called PyCBD. The pipeline itself is modular and easy to adapt to different use cases. Full article
Show Figures

Figure 1

15 pages, 2194 KiB  
Article
Enhancing Medical Image Segmentation: Ground Truth Optimization through Evaluating Uncertainty in Expert Annotations
by Georgios Athanasiou, Josep Lluis Arcos and Jesus Cerquides
Mathematics 2023, 11(17), 3771; https://doi.org/10.3390/math11173771 - 2 Sep 2023
Cited by 2 | Viewed by 2414
Abstract
The surge of supervised learning methods for segmentation lately has underscored the critical role of label quality in predicting performance. This issue is prevalent in the domain of medical imaging, where high annotation costs and inter-observer variability pose significant challenges. Acquiring labels commonly [...] Read more.
The surge of supervised learning methods for segmentation lately has underscored the critical role of label quality in predicting performance. This issue is prevalent in the domain of medical imaging, where high annotation costs and inter-observer variability pose significant challenges. Acquiring labels commonly involves multiple experts providing their interpretations of the “true” segmentation labels, each influenced by their individual biases. The blind acceptance of these noisy labels as the ground truth restricts the potential effectiveness of segmentation algorithms. Here, we apply coupled convolutional neural network approaches to a small-sized real-world dataset of bovine cumulus oocyte complexes. This is the first time these methods have been applied to a real-world annotation medical dataset, since they were previously tested only on artificially generated labels of medical and non-medical datasets. This dataset is crucial for healthy embryo development. Its application revealed an important challenge: the inability to effectively learn distinct confusion matrices for each expert due to large areas of agreement. In response, we propose a novel method that focuses on areas of high uncertainty. This approach allows us to understand the individual characteristics better, extract their behavior, and use this insight to create a more sophisticated ground truth using maximum likelihood. These findings contribute to the ongoing discussion of leveraging machine learning algorithms for medical image segmentation, particularly in scenarios involving multiple human annotators. Full article
Show Figures

Figure 1

Back to TopTop