Mathematics

Research

16 pages, 1468 KiB

Open AccessFeature PaperArticle

Probabilistic Forecasting of Crude Oil Prices Using Conditional Generative Adversarial Network Model with Lévy Process

by Mohammed Alruqimi and Luca Di Persio

Mathematics 2025, 13(2), 307; https://doi.org/10.3390/math13020307 - 18 Jan 2025

Viewed by 620

Accurate crude oil price forecasting is essential, considering oil’s critical role in the global economy. However, the crude oil market is significantly influenced by external, transient events, posing challenges in capturing price fluctuations’ complex dynamics and uncertainties. Traditional time series forecasting models, such [...] Read more.

Accurate crude oil price forecasting is essential, considering oil’s critical role in the global economy. However, the crude oil market is significantly influenced by external, transient events, posing challenges in capturing price fluctuations’ complex dynamics and uncertainties. Traditional time series forecasting models, such as ARIMA and LSTM, often rely on assumptions regarding data structure, limiting their flexibility to estimate volatility or account for external shocks effectively. Recent research highlights Generative Adversarial Networks (GANs) as a promising alternative approach for capturing intricate patterns in time series data, leveraging the adversarial learning framework. This paper introduces a Crude Oil-Driven Conditional GAN (CO-CGAN), a hybrid model for enhancing crude oil price forecasting by combining advanced AI frameworks (GANs), oil market sentiment analysis, and stochastic jump-diffusion models. By employing conditional supervised training, the inherent structure of the data distribution is preserved, thereby enabling more accurate and reliable probabilistic price forecasts. Additionally, the CO-CGAN integrates a Lévy process and sentiment features to better account for uncertainties and price shocks in the crude oil market. Experimental evaluations on two real-world oil price datasets demonstrate the superior performance of the proposed model, achieving a Mean Squared Error (MSE) of 0.000054 and outperforming benchmark models. Full article

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications, 3rd Edition)

► Show Figures

Figure 1

23 pages, 5589 KiB

Open AccessArticle

Cauchy–Logistic Unit Distribution: Properties and Application in Modeling Data Extremes

by Vladica S. Stojanović, Tanja Jovanović Spasojević, Radica Bojičić, Brankica Pažun and Zlatko Langović

Mathematics 2025, 13(2), 255; https://doi.org/10.3390/math13020255 - 14 Jan 2025

Viewed by 451

Abstract

This manuscript deals with a novel two-parameter stochastic distribution, obtained by transforming the Cauchy distribution, using generalized logistic mapping, into a unit interval. In this way, according to the well-known properties of the Cauchy distribution, a unit random variable with significantly accentuated values [...] Read more.

This manuscript deals with a novel two-parameter stochastic distribution, obtained by transforming the Cauchy distribution, using generalized logistic mapping, into a unit interval. In this way, according to the well-known properties of the Cauchy distribution, a unit random variable with significantly accentuated values at the ends of the unit interval is obtained. Therefore, the proposed stochastic distribution, named the Cauchy–logistic unit distribution, represents a stochastic model that may be suitable for modeling phenomena and processes with emphasized extreme values. Key stochastic properties of the CLU distribution are examined, such as moments, entropy, modality, and symmetry conditions. In addition, a quantile-based parameter estimation procedure, an asymptotic analysis of the thus obtained estimators, and their Monte Carlo simulation study are conducted. Finally, the application of the proposed distribution in stochastic modeling of some real-world data with emphasized extreme values is provided. Full article

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications, 3rd Edition)

► Show Figures

Figure 1

23 pages, 830 KiB

Open AccessArticle

Analyzing the Influence of Telematics-Based Pricing Strategies on Traditional Rating Factors in Auto Insurance Rate Regulation

by Shengkun Xie

Mathematics 2024, 12(19), 3150; https://doi.org/10.3390/math12193150 - 8 Oct 2024

Viewed by 1324

Abstract

This study examines how telematics variables such as annual percentage driven, total miles driven, and driving patterns influence the distributional behaviour of conventional rating factors when incorporated into predictive models for capturing auto insurance risk in rate regulation. To effectively manage the complexity [...] Read more.

This study examines how telematics variables such as annual percentage driven, total miles driven, and driving patterns influence the distributional behaviour of conventional rating factors when incorporated into predictive models for capturing auto insurance risk in rate regulation. To effectively manage the complexity inherent in telematics data, we advocate for the adoption of non-negative sparse principal component analysis (NSPCA) as a structured approach for data dimensionality reduction. By emphasizing sparsity and non-negativity constraints, NSPCA enhances the interpretability and predictive power of models concerning both loss severity and claim counts. This methodological innovation aims to advance statistical analyses within insurance pricing frameworks, ensuring the robustness of predictive models and providing insights crucial for rate regulation strategies specific to the auto insurance sector. Results show that, to enhance auto insurance risk pricing models, it is essential to address data dimension reduction challenges when integrating telematics data variables. Our findings underscore that integrating telematics variables into predictive models maintains the integrity of risk relativity estimates associated with traditional policy variables. Full article

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications, 3rd Edition)

► Show Figures

Figure 1

26 pages, 2564 KiB

Open AccessArticle

Multi-Task Forecasting of the Realized Volatilities of Agricultural Commodity Prices

by Rangan Gupta and Christian Pierdzioch

Mathematics 2024, 12(18), 2952; https://doi.org/10.3390/math12182952 - 23 Sep 2024

Viewed by 830

Abstract

Motivated by the comovement of realized volatilities (RVs) of agricultural commodity prices, we study whether multi-task forecasting algorithms improve the accuracy of out-of-sample forecasts of 15 agricultural commodities during the sample period from July 2015 to April 2023. We consider alternative multi-task stacking [...] Read more.

Motivated by the comovement of realized volatilities (RVs) of agricultural commodity prices, we study whether multi-task forecasting algorithms improve the accuracy of out-of-sample forecasts of 15 agricultural commodities during the sample period from July 2015 to April 2023. We consider alternative multi-task stacking algorithms and variants of the multivariate Lasso estimator. We find evidence of in-sample predictability but scarce evidence that multi-task forecasting improves out-of-sample forecasts relative to a classic univariate heterogeneous autoregressive (HAR)-RV model. This lack of systematic evidence of out-of-sample forecasting gains is corroborated by extensive robustness checks, including an in-depth study of the quantiles of the distributions of the RVs and subsample periods that account for increases in the total spillovers among the RVs. We also study an extended model that features the RVs of energy commodities and precious metals, but our conclusions remain unaffected. Besides offering important lessons for future research, our results are interesting for financial market participants, who rely on accurate forecasts of RVs when solving portfolio optimization and derivatives pricing problems, and policymakers, who need accurate forecasts of RVs when designing policies to mitigate the potential adverse effects of a rise in the RVs of agricultural commodity prices and the concomitant economic and political uncertainty. Full article

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications, 3rd Edition)

► Show Figures

Figure 1

18 pages, 7996 KiB

Open AccessArticle

Forecasting and Multilevel Early Warning of Wind Speed Using an Adaptive Kernel Estimator and Optimized Gated Recurrent Units

by Pengjiao Wang, Qiuliang Long, Hu Zhang, Xu Chen, Ran Yu and Fengqi Guo

Mathematics 2024, 12(16), 2581; https://doi.org/10.3390/math12162581 - 21 Aug 2024

Cited by 1 | Viewed by 802

Abstract

Accurately predicting wind speeds is of great significance in various engineering applications, such as the operation of high-speed trains. Machine learning models are effective in this field. However, existing studies generally provide deterministic predictions and utilize decomposition techniques in advance to enhance predictive [...] Read more.

Accurately predicting wind speeds is of great significance in various engineering applications, such as the operation of high-speed trains. Machine learning models are effective in this field. However, existing studies generally provide deterministic predictions and utilize decomposition techniques in advance to enhance predictive performance, which may encounter data leakage and fail to capture the stochastic nature of wind data. This work proposes an advanced framework for the prediction and early warning of wind speeds by combining the optimized gated recurrent unit (GRU) and adaptive kernel density estimator (AKDE). Firstly, 12 samples (26,280 points each) were collected from an extensive open database. Three representative metaheuristic algorithms were then employed to optimize the parameters of diverse models, including extreme learning machines, a transformer model, and recurrent networks. The results yielded an optimal selection using the GRU and the crested porcupine optimizer. Afterwards, by using the AKDE, the joint probability density and cumulative distribution function of wind predictions and related predicting errors could be obtained. It was then applicable to calculate the conditional probability that actual wind speed exceeds the critical value, thereby providing probabilistic-based predictions in a multilevel manner. A comparison of the predictive performance of various methods and accuracy of subsequent decisions validated the proposed framework. Full article

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications, 3rd Edition)

► Show Figures

Figure 1

16 pages, 3803 KiB

Open AccessArticle

Wind Energy Production in Italy: A Forecasting Approach Based on Fractional Brownian Motion and Generative Adversarial Networks

by Luca Di Persio, Nicola Fraccarolo and Andrea Veronese

Mathematics 2024, 12(13), 2105; https://doi.org/10.3390/math12132105 - 4 Jul 2024

Viewed by 862

Abstract

This paper focuses on developing a predictive model for wind energy production in Italy, aligning with the ambitious goals of the European Green Deal. In particular, by utilising real data from the SUD (South) Italian electricity zone over seven years, the model employs [...] Read more.

This paper focuses on developing a predictive model for wind energy production in Italy, aligning with the ambitious goals of the European Green Deal. In particular, by utilising real data from the SUD (South) Italian electricity zone over seven years, the model employs stochastic differential equations driven by (fractional) Brownian motion-based dynamic and generative adversarial networks to forecast wind energy production up to one week ahead accurately. Numerical simulations demonstrate the model’s effectiveness in capturing the complexities of wind energy prediction. Full article

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications, 3rd Edition)

► Show Figures

Figure 1

24 pages, 5099 KiB

Open AccessArticle

Predicting Compressive Strength of High-Performance Concrete Using Hybridization of Nature-Inspired Metaheuristic and Gradient Boosting Machine

by Nhat-Duc Hoang, Van-Duc Tran and Xuan-Linh Tran

Mathematics 2024, 12(8), 1267; https://doi.org/10.3390/math12081267 - 22 Apr 2024

Cited by 5 | Viewed by 1447

Abstract

This study proposes a novel integration of the Extreme Gradient Boosting Machine (XGBoost) and Differential Flower Pollination (DFP) for constructing an intelligent method to predict the compressive strength (CS) of high-performance concrete (HPC) mixes. The former is employed to generalize a mapping function [...] Read more.

This study proposes a novel integration of the Extreme Gradient Boosting Machine (XGBoost) and Differential Flower Pollination (DFP) for constructing an intelligent method to predict the compressive strength (CS) of high-performance concrete (HPC) mixes. The former is employed to generalize a mapping function between the mechanical property of concrete and its influencing factors. DFP, as a metaheuristic algorithm, is employed to optimize the learning phase of XGBoost and reach a fine balance between the two goals of model building: reducing the prediction error and maximizing the generalization capability. To construct the proposed method, a historical dataset consisting of 400 samples was collected from previous studies. The model’s performance is reliably assessed via multiple experiments and Wilcoxon signed-rank tests. The hybrid DFP-XGBoost is able to achieve good predictive outcomes with a root mean square error of 5.27, a mean absolute percentage error of 6.74%, and a coefficient of determination of 0.94. Additionally, quantile regression based on XGBoost is performed to construct interval predictions of the CS of HPC. Notably, an asymmetric error loss is used to diminish overestimations committed by the model. It was found that this loss function successfully reduced the percentage of overestimated CS values from 47.1% to 27.5%. Hence, DFP-XGBoost can be a promising approach for accurately and reliably estimating the CS of untested HPC mixes. Full article

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications, 3rd Edition)

► Show Figures

Figure 1

15 pages, 5583 KiB

Open AccessArticle

Hybrid Model of Natural Time Series with Neural Network Component and Adaptive Nonlinear Scheme: Application for Anomaly Detection

by Oksana Mandrikova and Bogdana Mandrikova

Mathematics 2024, 12(7), 1079; https://doi.org/10.3390/math12071079 - 3 Apr 2024

Cited by 2 | Viewed by 1083

Abstract

It is often difficult to describe natural time series due to implicit dependences and correlated noise. During anomalous natural processes, anomalous features appear in data. They have a nonstationary structure and do not allow us to apply traditional methods for time series modeling. [...] Read more.

It is often difficult to describe natural time series due to implicit dependences and correlated noise. During anomalous natural processes, anomalous features appear in data. They have a nonstationary structure and do not allow us to apply traditional methods for time series modeling. In order to solve these problems, new models, adequately describing natural data, are required. A new hybrid model of a time series (HMTS) with a nonstationary structure is proposed in this paper. The HMTS has regular and anomalous components. The HMTS regular component is determined on the basis of an autoencoder neural network. To describe the HMTS anomalous component, an adaptive nonlinear approximating scheme (ANAS) is used on a wavelet basis. HMTS is considered in this investigation for the problem of neutron monitor data modeling and anomaly detection. Anomalies in neutron monitor data indicate negative factors in space weather. The timely detection of these factors is critically important. This investigation showed that the developed HMTS adequately describes neutron monitor data and has satisfactory results from the point of view of numeric performance. The MSE model values are close to 0 and errors are white Gaussian noise. In order to optimize the estimate of the HMTS anomalous component, the likelihood ratio test was applied. Moreover, the wavelet basis, giving the least losses during ANAS construction, was determined. Statistical modeling results showed that HMTS provides a high accuracy of anomaly detection. When the signal/noise ratio is 1.3 and anomaly durations are more than 60 counts, the probability of their detection is close to 90%. This is a high rate in the problem domain under consideration and provides solution reliability of the problem of anomaly detection in neutron monitor data. Moreover, the processing of data from several neutron monitor stations showed the high sensitivity of the HMTS. This shows the possibility to minimize the number of engaged stations, maintaining anomaly detection accuracy compared to the global survey method widely used in this field. This result is important as the continuous operation of neutron monitor stations is not always provided. Thus, the results show that the developed HMTS has the potential to address the problem of anomaly detection in neutron monitor data even when the number of operating stations is small. The proposed HMTS can help us to decrease the risks of the negative impact of space weather anomalies on human health and modern infrastructure. Full article

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications, 3rd Edition)

► Show Figures

Figure 1

17 pages, 1883 KiB

Open AccessArticle

Analysis of a Predictive Mathematical Model of Weather Changes Based on Neural Networks

by Boris V. Malozyomov, Nikita V. Martyushev, Svetlana N. Sorokova, Egor A. Efremenkov, Denis V. Valuev and Mengxu Qi

Mathematics 2024, 12(3), 480; https://doi.org/10.3390/math12030480 - 2 Feb 2024

Cited by 13 | Viewed by 3452

Abstract

In this paper, we investigate mathematical models of meteorological forecasting based on the work of neural networks, which allow us to calculate presumptive meteorological parameters of the desired location on the basis of previous meteorological data. A new method of grouping neural networks [...] Read more.

In this paper, we investigate mathematical models of meteorological forecasting based on the work of neural networks, which allow us to calculate presumptive meteorological parameters of the desired location on the basis of previous meteorological data. A new method of grouping neural networks to obtain a more accurate output result is proposed. An algorithm is presented, based on which the most accurate meteorological forecast was obtained based on the results of the study. This algorithm can be used in a wide range of situations, such as obtaining data for the operation of equipment in a given location and studying meteorological parameters of the location. To build this model, we used data obtained from personal weather stations of the Weather Underground company and the US National Digital Forecast Database (NDFD). Also, a Google remote learning machine was used to compare the results with existing products on the market. The algorithm for building the forecast model covered several locations across the US in order to compare its performance in different weather zones. Different methods of training the machine to produce the most effective weather forecast result were also considered. Full article

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications, 3rd Edition)

► Show Figures

Figure 1

14 pages, 613 KiB

Open AccessFeature PaperArticle

Beyond Traditional Assessment: A Fuzzy Logic-Infused Hybrid Approach to Equitable Proficiency Evaluation via Online Practice Tests

by Todorka Glushkova, Vanya Ivanova and Boyan Zlatanov

Mathematics 2024, 12(3), 371; https://doi.org/10.3390/math12030371 - 24 Jan 2024

Cited by 1 | Viewed by 1116

Abstract

This article presents a hybrid approach to assessing students’ foreign language proficiency in a cyber–physical educational environment. It focuses on the advantages of the integrated assessment of student knowledge by considering the impact of automatic assessment, learners’ independent work, and their achievements to [...] Read more.

This article presents a hybrid approach to assessing students’ foreign language proficiency in a cyber–physical educational environment. It focuses on the advantages of the integrated assessment of student knowledge by considering the impact of automatic assessment, learners’ independent work, and their achievements to date. An assessment approach is described using the mathematical theory of fuzzy functions, which are employed to ensure the fair evaluation of students. The largest possible number of students whose reevaluation of test results will not affect the overall performance of the student group is automatically determined. The study also models the assessment process in the cyber–physical educational environment through the formal semantics of calculus of context-aware ambients (CCAs). Full article

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications, 3rd Edition)

► Show Figures

Figure 1

13 pages, 2193 KiB

Open AccessArticle

Enhanced Checkerboard Detection Using Gaussian Processes

by Michaël Hillen, Ivan De Boi, Thomas De Kerf, Seppe Sels, Edgar Cardenas De La Hoz, Jona Gladines, Gunther Steenackers, Rudi Penne and Steve Vanlanduit

Mathematics 2023, 11(22), 4568; https://doi.org/10.3390/math11224568 - 7 Nov 2023

Viewed by 2148

Abstract

Accurate checkerboard detection is of vital importance for computer vision applications, and a variety of checkerboard detectors have been developed in the past decades. While some detectors are able to handle partially occluded checkerboards, they fail when a large occlusion completely divides the [...] Read more.

Accurate checkerboard detection is of vital importance for computer vision applications, and a variety of checkerboard detectors have been developed in the past decades. While some detectors are able to handle partially occluded checkerboards, they fail when a large occlusion completely divides the checkerboard. We propose a new checkerboard detection pipeline for occluded checkerboards that has a robust performance under varying levels of noise, blurring, and distortion, and for a variety of imaging modalities. This pipeline consists of a checkerboard detector and checkerboard enhancement with Gaussian processes (GP). By learning a mapping from local board coordinates to image pixel coordinates via a Gaussian process, we can fill in occluded corners, expand the board beyond the image borders, allocate detected corners that do not fit an initial grid, and remove noise on the detected corner locations. We show that our method can improve the performance of other publicly available state-of-the-art checkerboard detectors, both in terms of accuracy and the number of corners detected. Our code and datasets are made publicly available. The checkerboard detector pipeline is contained within our Python checkerboard detection library, called PyCBD. The pipeline itself is modular and easy to adapt to different use cases. Full article

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications, 3rd Edition)

► Show Figures

Figure 1

15 pages, 2194 KiB

Open AccessArticle

Enhancing Medical Image Segmentation: Ground Truth Optimization through Evaluating Uncertainty in Expert Annotations

by Georgios Athanasiou, Josep Lluis Arcos and Jesus Cerquides

Mathematics 2023, 11(17), 3771; https://doi.org/10.3390/math11173771 - 2 Sep 2023

Cited by 2 | Viewed by 2883

Abstract

The surge of supervised learning methods for segmentation lately has underscored the critical role of label quality in predicting performance. This issue is prevalent in the domain of medical imaging, where high annotation costs and inter-observer variability pose significant challenges. Acquiring labels commonly [...] Read more.

The surge of supervised learning methods for segmentation lately has underscored the critical role of label quality in predicting performance. This issue is prevalent in the domain of medical imaging, where high annotation costs and inter-observer variability pose significant challenges. Acquiring labels commonly involves multiple experts providing their interpretations of the “true” segmentation labels, each influenced by their individual biases. The blind acceptance of these noisy labels as the ground truth restricts the potential effectiveness of segmentation algorithms. Here, we apply coupled convolutional neural network approaches to a small-sized real-world dataset of bovine cumulus oocyte complexes. This is the first time these methods have been applied to a real-world annotation medical dataset, since they were previously tested only on artificially generated labels of medical and non-medical datasets. This dataset is crucial for healthy embryo development. Its application revealed an important challenge: the inability to effectively learn distinct confusion matrices for each expert due to large areas of agreement. In response, we propose a novel method that focuses on areas of high uncertainty. This approach allows us to understand the individual characteristics better, extract their behavior, and use this insight to create a more sophisticated ground truth using maximum likelihood. These findings contribute to the ongoing discussion of leveraging machine learning algorithms for medical image segmentation, particularly in scenarios involving multiple human annotators. Full article

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications, 3rd Edition)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Statistical Data Modeling and Machine Learning with Applications, 3rd Edition

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (12 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI