Stats

26 pages, 12216 KiB

Open AccessArticle

Measurement and Decomposition Analysis of Occupational Income Inequality in China

by Jing Yuan, Teng Ma, Yinghui Wang and Zongwu Cai

Stats 2025, 8(1), 13; https://doi.org/10.3390/stats8010013 - 2 Feb 2025

Viewed by 183

Using the China CFPS database, this paper measures the degree of intra-occupational inequality in China with the Pareto coefficient and uses the generalized entropy index to decompose the top income gap by region as well as by industry. The empirical results show that, [...] Read more.

Using the China CFPS database, this paper measures the degree of intra-occupational inequality in China with the Pareto coefficient and uses the generalized entropy index to decompose the top income gap by region as well as by industry. The empirical results show that, firstly, the degree of income inequality between occupations in China has increased significantly in recent years. The provinces with a higher degree of income inequality between occupations are mostly located in the more economically developed regions in the central and eastern parts of the country, while the degree of inequality between occupations in the western part is lower. Secondly, the highest-income occupations are mainly in the manufacturing industry, with relatively high levels in the construction industry, the education sector, the wholesale and retail trade, and public administration and social organizations, while the levels in other occupations are relatively low. Lastly, the top income gap primarily originates from within industries. However, the contribution rate of the top income gap between industries is gradually increasing, while the contribution rate of the top income gap within industries is gradually decreasing. Full article

(This article belongs to the Section Financial Statistics)

17 pages, 1319 KiB

Open AccessCommunication

Smart Renting: Harnessing Urban Data with Statistical and Machine Learning Methods for Predicting Property Rental Prices from a Tenant’s Perspective

by Francisco Louzada, Kleython José Coriolano Cavalcanti de Lacerda, Paulo Henrique Ferreira and Naomy Duarte Gomes

Stats 2025, 8(1), 12; https://doi.org/10.3390/stats8010012 - 27 Jan 2025

Viewed by 381

Abstract

The real estate market plays a pivotal role in most nations’ economy, showcasing continuous growth. Particularly noteworthy is the rapid expansion of the digital real estate sector, marked by innovations like 3D visualization and streamlined online contractual processes, a momentum further accelerated by [...] Read more.

The real estate market plays a pivotal role in most nations’ economy, showcasing continuous growth. Particularly noteworthy is the rapid expansion of the digital real estate sector, marked by innovations like 3D visualization and streamlined online contractual processes, a momentum further accelerated by the aftermath of the Coronavirus Disease 2019 (COVID-19) pandemic. Amidst this transformative landscape, artificial intelligence emerges as a vital force, addressing consumer needs by harnessing data analytics for predicting and monitoring rental prices. While studies have demonstrated the efficacy of machine learning (ML) algorithms such as decision trees and neural networks in predicting house prices, there is a lack of research specifically focused on rental property prices, a significant sector in Brazil due to the prohibitive costs associated with property acquisition. This study fills this crucial gap by delving into the intricacies of rental pricing, using data from the city of São Carlos-SP, Brazil. The research aims to analyze, model, and predict rental prices, employing an approach that incorporates diverse ML models. Through this analysis, our work showcases the potential of ML algorithms in accurately predicting rental house prices. Moreover, it envisions the practical application of this research with the development of a user-friendly website. This platform could revolutionize the renting experience, empowering both tenants and real estate agencies with the ability to estimate rental values based on specific property attributes and have access to its statistics. Full article

► Show Figures

Figure 1

32 pages, 3452 KiB

Open AccessReview

Assessment of Reliability Allocation Methods for Electronic Systems: A Systematic and Bibliometric Analysis

by Rajkumar B. Patil, San Kyeong, Michael Pecht, Rahul A. Gujar and Sandip Mane

Stats 2025, 8(1), 11; https://doi.org/10.3390/stats8010011 - 24 Jan 2025

Viewed by 766

Abstract

Reliability allocation is the process of assigning reliability targets to sub-systems within a system to meet the overall reliability requirements. However, many traditional reliability allocation methods rely on assumptions that are often unrealistic, leading to misleading, unachievable, and costly outcomes. This paper provides [...] Read more.

Reliability allocation is the process of assigning reliability targets to sub-systems within a system to meet the overall reliability requirements. However, many traditional reliability allocation methods rely on assumptions that are often unrealistic, leading to misleading, unachievable, and costly outcomes. This paper provides a historical review of reliability allocation methods, focusing on the Weighing Factor Method (WFM), with a detailed analysis of its main findings, assumptions, and limitations. Additionally, the review covers methods for reliability optimization, redundancy optimization, and multi-state system optimization, highlighting their strengths and shortcomings. A case study is presented to demonstrate how the assumption of an exponential distribution impacts the reliability allocation process, showing the limitations it imposes on practical implementations. Furthermore, a bibliometric analysis is conducted to assess publication trends in the field of reliability allocation. Through examples, particularly in the context of electronic systems using commercial off-the-shelf (COTS) components, the challenges are discussed, and recommendations for alternative approaches to improve the reliability allocation process are provided. Full article

(This article belongs to the Section Reliability Engineering)

► Show Figures

Figure 1

37 pages, 8999 KiB

Open AccessArticle

An Improved Soft Island Model of the Fish School Search Algorithm with Exponential Step Decay Using Cluster-Based Population Initialization

by Liliya A. Demidova and Vladimir E. Zhuravlev

Stats 2025, 8(1), 10; https://doi.org/10.3390/stats8010010 - 22 Jan 2025

Viewed by 516

Abstract

Optimization is a highly relevant area of research due to its widespread applications. The development of new optimization algorithms or the improvement of existing ones enhances the efficiency of various fields of activity. In this paper, an improved Soft Island Model (SIM) is [...] Read more.

Optimization is a highly relevant area of research due to its widespread applications. The development of new optimization algorithms or the improvement of existing ones enhances the efficiency of various fields of activity. In this paper, an improved Soft Island Model (SIM) is considered for the Tent-map-based Fish School Search algorithm with Exponential step decay (ETFSS). The proposed model is based on a probabilistic approach to realize the migration process relying on the statistics of the overall achievement of each island. In order to generate the initial population of the algorithm, a new initialization method is proposed in which all islands are formed in separate regions of the search space, thus forming clusters. For the presented SIM-ETFSS algorithm, numerical experiments with the optimization of classical test functions, as well as checks for the presence of some known defects that lead to undesirable effects in problem solving, have been carried out. Tools, such as the Mann–Whitney U test, box plots and other statistical methods of data analysis, are used to evaluate the quality of the presented algorithm, using which the superiority of SIM-ETFSS over its original version is demonstrated. The results obtained are analyzed and discussed. Full article

► Show Figures

Figure 1

20 pages, 1008 KiB

Open AccessArticle

Predicting and Mitigating Delays in Cross-Dock Operations: A Data-Driven Approach

by Amna Altaf, Adeel Mehmood, Adnen El Amraoui, François Delmotte and Christophe Lecoutre

Stats 2025, 8(1), 9; https://doi.org/10.3390/stats8010009 - 20 Jan 2025

Viewed by 382

Abstract

Cross-docking operations are highly dependent on precise scheduling and timely truck arrivals to ensure streamlined logistics and minimal storage costs. Predicting potential delays in truck arrivals is essential to avoiding disruptions that can propagate throughout the cross-dock facility. This paper investigates the effectiveness [...] Read more.

Cross-docking operations are highly dependent on precise scheduling and timely truck arrivals to ensure streamlined logistics and minimal storage costs. Predicting potential delays in truck arrivals is essential to avoiding disruptions that can propagate throughout the cross-dock facility. This paper investigates the effectiveness of deep learning models, including Convolutional Neural Networks (CNN), Multilayer Perceptrons (MLPs), and Recurrent Neural Networks (RNNs), in predicting late arrivals of trucks. Through extensive comparative analysis, we evaluate the performance of each model in terms of prediction accuracy and applicability to real-world cross-docking requirements. The results highlight which models can most accurately predict delays, enabling proactive measures for handling deviations and improving operational efficiency. Our findings support the potential for deep learning models to enhance cross-docking reliability, ultimately contributing to optimized logistics and supply chain resilience. Full article

(This article belongs to the Section Reliability Engineering)

► Show Figures

Figure 1

20 pages, 3127 KiB

Open AccessArticle

A New Weighted Lindley Model with Applications to Extreme Historical Insurance Claims

by Morad Alizadeh, Mahmoud Afshari, Gauss M. Cordeiro, Ziaurrahman Ramaki, Javier E. Contreras-Reyes, Fatemeh Dirnik and Haitham M. Yousof

Stats 2025, 8(1), 8; https://doi.org/10.3390/stats8010008 - 15 Jan 2025

Viewed by 642

Abstract

In this paper, we propose a weighted Lindley (NWLi) model for the analysis of extreme historical insurance claims. It extends the classical Lindley distribution by incorporating a weight parameter, enabling more flexibility in modeling insurance claim severity. We provide a comprehensive theoretical overview [...] Read more.

In this paper, we propose a weighted Lindley (NWLi) model for the analysis of extreme historical insurance claims. It extends the classical Lindley distribution by incorporating a weight parameter, enabling more flexibility in modeling insurance claim severity. We provide a comprehensive theoretical overview of the new model and explore two practical applications. First, we investigate the mean-of-order P (MOOP_(P)) approach for quantifying the expected claim severity based on the NWLi model. Second, we implement a peaks over a random threshold (PORT) analysis using the value-at-risk metric to assess extreme claim occurrences under the new model. Further, we provide a simulation study to evaluate the accuracy of the estimators under various methods. The proposed model and its applications provide a versatile tool for actuaries and risk analysts to analyze and predict extreme insurance claim severity, offering insights into risk management and decision-making within the insurance industry. Full article

(This article belongs to the Section Reliability Engineering)

► Show Figures

Figure 1

18 pages, 1035 KiB

Open AccessArticle

Bidirectional f-Divergence-Based Deep Generative Method for Imputing Missing Values in Time-Series Data

by Wen-Shan Liu, Tong Si, Aldas Kriauciunas, Marcus Snell and Haijun Gong

Stats 2025, 8(1), 7; https://doi.org/10.3390/stats8010007 - 14 Jan 2025

Viewed by 522

Abstract

Imputing missing values in high-dimensional time-series data remains a significant challenge in statistics and machine learning. Although various methods have been proposed in recent years, many struggle with limitations and reduced accuracy, particularly when the missing rate is high. In this work, we [...] Read more.

Imputing missing values in high-dimensional time-series data remains a significant challenge in statistics and machine learning. Although various methods have been proposed in recent years, many struggle with limitations and reduced accuracy, particularly when the missing rate is high. In this work, we present a novel f-divergence-based bidirectional generative adversarial imputation network, tf-BiGAIN, designed to address these challenges in time-series data imputation. Unlike traditional imputation methods, tf-BiGAIN employs a generative model to synthesize missing values without relying on distributional assumptions. The imputation process is achieved by training two neural networks, implemented using bidirectional modified gated recurrent units, with f-divergence serving as the objective function to guide optimization. Compared to existing deep learning-based methods, tf-BiGAIN introduces two key innovations. First, the use of f-divergence provides a flexible and adaptable framework for optimizing the model across diverse imputation tasks, enhancing its versatility. Second, the use of bidirectional gated recurrent units allows the model to leverage both forward and backward temporal information. This bidirectional approach enables the model to effectively capture dependencies from both past and future observations, enhancing its imputation accuracy and robustness. We applied tf-BiGAIN to analyze two real-world time-series datasets, demonstrating its superior performance in imputing missing values and outperforming existing methods in terms of accuracy and robustness. Full article

► Show Figures

Figure 1

30 pages, 6909 KiB

Open AccessArticle

The Use of Modern Robust Regression Analysis with Graphics: An Example from Marketing

by Marco Riani, Anthony C. Atkinson, Gianluca Morelli and Aldo Corbellini

Stats 2025, 8(1), 6; https://doi.org/10.3390/stats8010006 - 8 Jan 2025

Viewed by 540

Abstract

Routine least squares regression analyses may sometimes miss important aspects of data. To exemplify this point we analyse a set of 1171 observations from a questionnaire intended to illuminate the relationship between customer loyalty and perceptions of such factors as price and community [...] Read more.

Routine least squares regression analyses may sometimes miss important aspects of data. To exemplify this point we analyse a set of 1171 observations from a questionnaire intended to illuminate the relationship between customer loyalty and perceptions of such factors as price and community outreach. Our analysis makes much use of graphics and data monitoring to provide a paradigmatic example of the use of modern robust statistical tools based on graphical interaction with data. We start with regression. We perform such an analysis and find significant regression on all factors. However, a variety of plots show that there are some unexplained features, which are not eliminated by response transformation. Accordingly, we turn to robust analyses, intended to give answers unaffected by the presence of data contamination. A robust analysis using a non-parametric model leads to the increased significance of transformations of the explanatory variables. These transformations provide improved insight into consumer behaviour. We provide suggestions for a structured approach to modern robust regression and give links to the software used for our data analyses. Full article

(This article belongs to the Section Regression Models)

► Show Figures

Figure 1

17 pages, 1524 KiB

Open AccessArticle

Exact Inference for Random Effects Meta-Analyses for Small, Sparse Data

by Jessica Gronsbell, Zachary R. McCaw, Timothy Regis and Lu Tian

Stats 2025, 8(1), 5; https://doi.org/10.3390/stats8010005 - 7 Jan 2025

Viewed by 385

Abstract

Meta-analysis aggregates information across related studies to provide more reliable statistical inference and has been a vital tool for assessing the safety and efficacy of many high-profile pharmaceutical products. A key challenge in conducting a meta-analysis is that the number of related studies [...] Read more.

Meta-analysis aggregates information across related studies to provide more reliable statistical inference and has been a vital tool for assessing the safety and efficacy of many high-profile pharmaceutical products. A key challenge in conducting a meta-analysis is that the number of related studies is typically small. Applying classical methods that are asymptotic in the number of studies can compromise the validity of inference, particularly when heterogeneity across studies is present. Moreover, serious adverse events are often rare and can result in one or more studies with no events in at least one study arm. Practitioners remove studies in which no events have occurred in one or both arms or apply arbitrary continuity corrections (e.g., adding one event to arms with zero events) to stabilize or define effect estimates in such settings, which can further invalidate subsequent inference. To address these significant practical issues, we introduce an exact inference method for random effects meta-analysis of a treatment effect in the two-sample setting with rare events, which we coin “XRRmeta”. In contrast to existing methods, XRRmeta provides valid inference for meta-analysis in the presence of between-study heterogeneity and when the event rates, number of studies, and/or the within-study sample sizes are small. Extensive numerical studies indicate that XRRmeta does not yield overly conservative inference. We apply our proposed method to two real-data examples using our open-source R package. Full article

► Show Figures

Figure 1

16 pages, 12412 KiB

Open AccessCommunication

Spatial Clusters of Gambling Outlet: A Machine Learning Tree-Based Algorithm

by Salvador Martínez-Cava, Fernando A. López and MLuz Maté Sánchez-del-Val

Stats 2025, 8(1), 4; https://doi.org/10.3390/stats8010004 - 3 Jan 2025

Viewed by 790

Abstract

The localization of gambling establishments is a relevant topic in gambling research. In this paper, we analyze the spatial distribution of two types of gambling establishments—private and public—over the last 10 years in the municipality of Madrid (Spain). Using a spatial scan statistic, [...] Read more.

The localization of gambling establishments is a relevant topic in gambling research. In this paper, we analyze the spatial distribution of two types of gambling establishments—private and public—over the last 10 years in the municipality of Madrid (Spain). Using a spatial scan statistic, we identify the temporal dynamics of spatial clusters with high densities. The results reveal different spatial patterns regarding the locations of these two types of gambling establishments. While public gambling establishments do not exhibit spatial clustering, private gambling establishments show a growth in spatial clustering with dynamic behavior, seeking locations with specific sociodemographic characteristics. A machine learning tree-based algorithm is used to confirm that decisions on where to put new gambling establishments are based on targeting customers with a gambling profile. Full article

► Show Figures

Figure 1

15 pages, 497 KiB

Open AccessArticle

Comparing Robust Haberman Linking and Invariance Alignment

by Alexander Robitzsch

Stats 2025, 8(1), 3; https://doi.org/10.3390/stats8010003 - 2 Jan 2025

Viewed by 405

Abstract

Linking methods are widely used in the social sciences to compare group differences regarding the mean and the standard deviation of a factor variable. This article examines a comparison between robust Haberman linking (HL) and invariance alignment (IA) for factor models with dichotomous [...] Read more.

Linking methods are widely used in the social sciences to compare group differences regarding the mean and the standard deviation of a factor variable. This article examines a comparison between robust Haberman linking (HL) and invariance alignment (IA) for factor models with dichotomous and continuous items, utilizing the

L_{0.5}

and

L_{0}

loss functions. A simulation study demonstrates that HL outperforms IA when item intercepts are used for linking, rather than the original HL approach, which relies on item difficulties. The results regarding the choice of loss function were mixed:

L_{0}

showed superior performance in the simulation study with continuous items, while

L_{0.5}

performed better in the study with dichotomous items. Full article

(This article belongs to the Section Computational Statistics)

► Show Figures

Figure 1

35 pages, 2379 KiB

Open AccessCommunication

Seasonal Analysis and Risk Management Strategies for Credit Guarantee Funds: A Case Study from Republic of Korea

by Juryon Paik and Kwangho Ko

Stats 2025, 8(1), 2; https://doi.org/10.3390/stats8010002 - 26 Dec 2024

Viewed by 451

Abstract

This study investigates the prediction of small and medium-sized enterprise (SME) default rates in Republic of Korea by comparing the performance of three prominent time-series forecasting models: ARIMA, SARIMA, and Prophet. The research utilizes a comprehensive dataset provided by the Korea Credit Guarantee [...] Read more.

This study investigates the prediction of small and medium-sized enterprise (SME) default rates in Republic of Korea by comparing the performance of three prominent time-series forecasting models: ARIMA, SARIMA, and Prophet. The research utilizes a comprehensive dataset provided by the Korea Credit Guarantee Fund (KODIT), which covers regional and monthly default rates from January 2012 to December 2023, spanning 12 years. By focusing on Republic of Korea’s 17 major cities, the study aims to identify regional and seasonal patterns in default rates, highlighting the critical role that regional economic conditions and seasonality play in risk management. The proposed methodology includes an exploratory analysis of default rate trends and seasonal patterns, followed by a comparative evaluation of ARIMA, SARIMA, and Prophet models. ARIMA serves as a baseline model for capturing non-seasonal trends, while SARIMA incorporates seasonal components to handle recurring patterns. Prophet is uniquely suited for dynamic datasets, offering the ability to include external factors such as holidays or economic shocks. This work distinguishes itself from others by combining these three models to provide a comprehensive approach to regional and seasonal default risk forecasting, offering insights specific to Republic of Korea’s economic landscape. Each model is evaluated based on its ability to capture trends, seasonality, and irregularities in the data. The ARIMA model shows strong performance in stable economic environments, while SARIMA proves effective in modeling seasonal patterns. The Prophet model, however, demonstrates superior flexibility in handling irregular trends and external events, making it the most accurate model for predicting default rates across varied economic regions. The study concludes that Prophet’s adaptability to irregularities and external factors positions it as the most suitable model for dynamic economic conditions. These findings emphasize the importance of region-specific and seasonal factors in tailoring risk forecasting models. Future research will validate these predictions by comparing forecasted default rates with actual data from 2024, providing actionable insights into the long-term effectiveness of the proposed methods. This comparison aims to refine the models further, ensuring robust financial stability and enhanced SME support strategies for institutions like KODIT. Full article

► Show Figures

Figure 1

16 pages, 2022 KiB

Open AccessCommunication

Survival Times of Transplanted Kidneys Among Different Donor–Recipient Cohorts: The United States Registry Analysis from 1987 to 2018, Part 1: Gender and Ethnicity

by Nezamoddin N. Kachouie, Alain Despeignes and Daniel Breininger

Stats 2025, 8(1), 1; https://doi.org/10.3390/stats8010001 - 26 Dec 2024

Viewed by 430

Abstract

Over seven thousand people on average die each year in the United States waiting for an organ transplant due to the shortage of donated organs. With this alarming concern, efforts from the health organizations like the United Network Organ Sharing (UNOS) and government [...] Read more.

Over seven thousand people on average die each year in the United States waiting for an organ transplant due to the shortage of donated organs. With this alarming concern, efforts from the health organizations like the United Network Organ Sharing (UNOS) and government officials, by sharing the transplant data, inspire the investigation of the characteristics among donors and recipients that affects the longevity of donated organs. The goal of this study is to investigate the survival time of transplanted kidneys from 1987 to 2018 regarding the donors’ and the recipients’ characteristics. Survival analysis is performed to determine the characteristics associated with survival time of transplanted kidneys. Our results indicate that there is a noticeable correlation between the survival time and the matching ethnicity of donor and recipient. However, the optimal survival time was not necessarily associated with the matching genders of donor and recipient. It was observed that, on average, the male-to-female kidney transplant has a longer survival time. The premise of this study was the use of statistical analysis methods to identify general trends in survival times of transplanted kidneys among different patient cohorts available through the UNOS registry. We must emphasize that the context of this research is bounded within the domain of statistical analysis and within the scope of the methods that were employed in this study. The outcomes of this study are of statistical interest, with potential clinical significance. Full article

► Show Figures

Figure 1

Journal Menu

Journal Browser

Stats, Volume 8, Issue 1 (March 2025) – 13 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI