Next Article in Journal
Fast Prediction of Characteristics in Wound Rotor Synchronous Condenser Using Subdomain Modeling
Previous Article in Journal
Pilot Contamination Attack Detection Methods—An Exhaustive Performance Evaluation Through Probability Metrics and Statistical Classification Parameters
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Copula-Based Regression with Mixed Covariates

1
Department of Statistics and Business Analytics, UAE University, Al Ain P.O. Box 15551, United Arab Emirates
2
Département de Mathématiques et d’Informatique, Université du Québec à Trois-Rivières, Trois-Rivières, QC G9A 5H7, Canada
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(22), 3525; https://doi.org/10.3390/math12223525
Submission received: 24 August 2024 / Revised: 18 October 2024 / Accepted: 29 October 2024 / Published: 12 November 2024
(This article belongs to the Section Probability and Statistics)

Abstract

:
In this paper, we focused on developing copula-based modeling procedures that effectively capture the dependence between response and explanatory variables. Building upon the work of Noh et al. (J. Am. Stat. Assoc. 2013, 108, 676–688) we extended copula-based regression to accommodate both continuous and discrete covariates. Specifically, we explored the construction of copulas to estimate the conditional mean of the response variable given the covariates, elucidating the relationship between copula structures and marginal distributions. We considered various estimation methods for copulas and distribution functions, presenting a diverse array of estimators for the conditional mean function. These estimators range from non-parametric to semi-parametric and fully parametric, offering flexibility in modeling regression relationships. An adapted algorithm is applied to construct copulas and simulations are carried out to replicate datasets, estimate prediction model parameters, and compare with the OLS method. The practicality and efficacy of our proposed methodologies, grounded in the principles of copula-based regression, are substantiated through methodical simulation studies.
MSC:
60E05; 62H05; 62H20; 62J05; 62G05; 62G08

1. Introduction

Scientists are often led to study the relationships and dependencies between the response variable and several other covariates. However, regression analysis is the statistical tool for investigating such relationships and it is one of the most commonly used statistical methods in many scientific fields, such as medicine, biology, agriculture, economics, engineering, sociology, etc. In medical research, econometrics, and other research fields, it is very common to use regression analysis to interpret the correlation existing between different variables. However, the basic form of the regression analysis is not suitable for many cases, where the relationships are often non-linear and the probability distribution of the output variable may be an abnormal distribution.
For such dependence modeling problems, we attempt to provide a functional form that will summarize the relationship between response and explanatory variables. In several practical situations, as an example, a vector of covariates X = ( X 1 , , X d ) is used to explain, interpret, or predict the response variable Y. This is encountered in many fields, including medical fields and social science. The type of functional relationship we attempt to figure out could depend on the marginal behavior of variables or their joint behavior. In this paper, we consider the construction of dependent modeling procedures based on the separation of these two behaviors when the covariates are a mixture of continuous and discrete variables.
For this context, we consider procedures that allow the representation of a multivariate distribution as a function of its uni-variate marginals through a connection function called a copula. Copulas have been increasingly popular for modeling statistical dependence in multivariate sets of data and have been applied to various areas, including medical research, environmental science, econometrics, actuarial science, agronomy, and others. A key feature of copulas is that they provide flexible representations of the multivariate distribution by allowing for the dependence structure of the variables of interest to be modeled separately from the marginal structure and, by specifying a copula, we summarize all the dependencies between margins (see Nelsen [1] for more about this subject).
The power of this approach principally lies in the ability for a practitioner to model the dependence structure independently of the marginal behaviors. Furthermore, the advantages of using copulas in modeling are the allowance to model both linear and non-linear dependence, an arbitrary choice of a marginal distribution, and the capability of modeling extreme endpoints. However, the principal advantage of a copula regression is that there are no restrictions and no specification on the probability distributions that can be used.
It is interesting to note that copula-based regression models offer significant advantages in capturing complex dependencies between variables, making them highly useful in various fields. In finance, they allow for better portfolio risk management by modeling non-linear dependencies between asset returns and macroeconomic factors, especially during market downturns. In insurance, copula-based regression can be applied to explain pricing in terms of different dependent types of claims, such as frequency and severity. In environmental studies, regression as function of a copula is useful to establish the relationship between rainfall and river discharge, especially in the case of non-linear dependence. In healthcare, regression with a copula enables researchers to examine how lifestyle factors influence health outcomes, such as cholesterol levels, while capturing the potential interdependence among these health indicators.
In the literature, there exist many recent studies of regression based on copulas; as examples, we cite Sheikhi et al. [2] and Ali et al. [3] among others. As a new contribution to this domain, we consider in this paper the estimation problem of the mean regression function for a regression model, where X = ( X 1 , , X d ) T is a random vector of dimension d 1 and Y is a random variable with cumulative distribution function (c.d.f.) F 0 and density function f 0 . Y is the response variable and X is the set of covariates. We denote by F i the c.d.f. of the variables X i and we denote by f i its corresponding density. For a given x = ( x 1 , , x d ) T , we will note by F ( x ) the shortcut for ( F 1 ( x 1 ) , , F d ( x d ) ) . From the inspiring work of Sklar [4], the c.d.f. of ( Y , X T ) T evaluated at ( y , x T ) T can be expressed in terms of C ( F 0 ( y ) , F ( x ) ) , where C is the copula distribution of ( Y , X T ) T , that is, the function from [ 0 , 1 ] d + 1 to [ 0 , 1 ] defined by
C ( u 0 , u 1 , , u d ) = P F 0 ( Y ) u 0 , F 1 ( X 1 ) u 1 , , F d ( X d ) u d .
Recently, Noh et al. [5] exploited the above decomposition to introduce a novel idea consisting of expressing the mean regression function m ( x ) , in terms of the copula and margins of x as follows.
m ( x ) = E [ Y c ( F 0 ( Y ) , F ( x ) ) ] c X ( F ( x ) ) .
where c ( u 0 , v ) c ( u 0 , v 1 , , v d ) is the copula density corresponding to C and c X ( v ) c X ( v 1 , , v d ) is the copula density of X . This shows that the mean regression function m ( x ) is the ratio of a numerator that only captures the mean dependence between Y and X and a denominator that captures the dependence within X. It is worth mentioning that the formula is only valid when the covariates are continuous. A new reformulation is needed when the covariates are not all continuous, which is the case for many real-world applications, especially in medicine.
Furthermore, Noh et al. [5] proposed a semi-parametric estimator for the regression function given in (1). Specifically, they utilized the inference function for margins (IFM) technique to estimate the copula-based regression curve. This method proceeds in two stages: first, it estimates the marginal parameters, and then it estimates the corresponding dependence parameter. These authors demonstrate, both theoretically and empirically, that the resulting estimates obtained exhibit desirable properties when the parametric copula family is adequately chosen.
Noh et al. [5] stimulated extensive research on copula-based regression. Noh et al. [6] applied the method of Noh et al. [5] to the quantile regression with i.i.d. or time series that are completely observed. De Backer et al. [7] extended the method of Noh et al. [6] to the quantile regression with censored data. Kraus and Czado [8] studied the quantile regression with complete data, using D-vine copulas. Rémillard et al. [9] discussed the asymptotic connection between the estimators of Noh et al. [6] and Kraus and Czado [8]. Chang and Joe [10] proposed an algorithm for computing the conditional distribution function via the vine copula. Furthermore, Nagler and Vatter [11] unified various copula-based regressions by formulating a general loss function which may not be continuously differentiable. Their generalized regression model includes the conditional mean regression of Noh et al. [5], the conditional quantile regression of Noh et al. [6], and the asymmetric least squares of Newey and Powell [12] as special cases. The unified framework enhances the systematic interpretation of the different existing regressions. For additional discussion into similar methods, see [13,14,15,16,17] and the literature cited therein.
As an extension of the framework by Noh et al. [5], we incorporate discrete variables into the set of covariates X 1 , , X d . By establishing a connection with various classes of copulas through an alternative equation to (1), we calculate the conditional mean, m ( x ) , of Y | X = x . In this context, we develop the relationship between the copula and the marginals. Furthermore, we illustrate this relationship for specific families of copulas, such as Archimedean copulas and the Gaussian copula, highlighting their properties that are beneficial for our analysis.
The next step involved addressing the estimation problem. Here, we also adopt a semi-parametric approach along with the inference function for margins (IFM) method to estimate the proposed regression curve. First, we estimate the marginal distributions using their empirical distributions, and then we estimate the dependence parameter associated with the underlying copula. A simulation studies for different classes of copulas and different distributions for the output Y are considered to illustrate the usefulness of the findings.
The rest of the paper is organized as follows. Section 2, discusses different copula concepts in the multivariate setting. Section 3 outlines the copula-based regression model proposed for case where the set of covariates includes both discrete and continuous variables. Section 4 covers the estimation procedure of the proposed regression model. Section 5 is dedicated to a simulation study that assesses the performance of the suggested copula-based regression. Conclusion and remarks come in Section 6.

2. Preliminaries

Copulas are a mathematical concept used in multivariate analysis to describe the dependency structure between components of a multivariate random vector. They play a central role in various fields employing multivariate statistical analysis, such as risk management and finance. Therefore, copulas provide a framework for modeling the relationships between variables by describing their joint distribution independently of their marginal distributions.
This section provides a brief overview of the copula concept, which will be utilized in the development of the proposed model. According to Nelsen [1], a multivariate copula is defined as follows.
Definition 1.
A d-dimensional copula is a function from [ 0 , 1 ] d to [ 0 , 1 ] with following properties:
1. 
For every u = ( u 1 , , u d ) [ 0 , 1 ] d ,
(i) 
C ( u ) = 0 if at least one of coordinate of u is 0.
(ii) 
C ( u ) = u k if all coordinates of u are 1 except u k .
2. 
For every a = ( a 1 , , a d ) and b = ( b 1 , , b d ) in [ 0 , 1 ] d such that a i b i , i = 1 , , d ,
Δ a d b d Δ a d 1 b d 1 Δ a 1 b 1 C ( u ) 0 , for all u [ 0 , 1 ] d ,
where,
Δ a k b k C ( u ) = C ( u 1 , , u k 1 , b k , u k + 1 , , u d ) C ( u 1 , , u k 1 , a k , u k + 1 , , u d )
Sklar’s Theorem is a fundamental result in copula theory. It enables us to express the joint distribution of a multivariate random vector in terms of their marginal distributions and a copula function. It can be stated as follows (see, Nelsen [1]).
Theorem 1.
Let H be a d-dimensional distribution function with marginal distributions F 1 , F 2 , , F d . Then, there exists a d-copula C such that for all ( x 1 , , x d ) R n ,
H ( x 1 , , x d ) = C ( F 1 ( x 1 ) , F 2 ( x 2 ) , , F d ( x d ) ) .
If F 1 , F 2 , , F d are all continuous, then C is unique; otherwise, C is uniquely determined on Ran ( F 1 ) × × Ran ( F d ) . Conversely, if C is a d-copula and F 1 , F 2 , , F d are distribution functions, then the function H defined by is a d-dimensional distribution function with margins F 1 , F 2 , , F d .
It is well-known that Sklar’s Theorem has numerous practical applications in various fields involving multivariate data analysis. For instance, Sklar’s Theorem is commonly employed to analyze dependencies among different financial assets. It enables us to understand how the dependence structure between the prices of different assets might affect the overall risk of a portfolio.

2.1. Archimedean Copulas

Archimedean copulas constitute an important class of parametric copulas. This type of copula describes the dependence structure between random variables with greater flexibility through a single function called the generator. The latter is often expressed in terms of dependence parameters that control the strength of dependence among the components of a given random vector.
The generator of a d-dimensional Archimedean copula is an increasing and continuous function ϕ defined from [ 0 , ) to [ 0 , 1 ] such that ϕ ( 0 ) = 0 and ϕ ( ) = 1 . Suppose that ϕ is differentiable up to the order d 2 with derivatives noted by ϕ ( i ) for i = 1 , , d 2 and let ψ be its inverse, that is, ψ = ϕ 1 . Hereafter is the definition of a multivariate Archimedean copula. For details on this subject, see McNeil and Nešlehová [18].
Definition 2.
The d-dimensional Archimedean copula C ϕ is defined through its generator ϕ as follows:
C ϕ ( u 1 , , u d ) = ϕ ( ψ ( u 1 ) + + ψ ( u d ) ) , for all ( u 1 , , u d ) [ 0 , 1 ] d ,
where the generator ϕ is subject to the conditions that ( 1 ) i ϕ ( i ) ( x ) 0 for i = 1 , , d 2 , and ( 1 ) d 2 ϕ ( d 2 ) is non-increasing and convex.
Hereafter, we present the Clayton copula and the Frank copula, both considered among the most popular multivariate Archimedean copulas. The d-dimensional Clayton copula is defined as follows:
C ϕ ( u 1 , , u d ) = u 1 θ + + u d θ d + 1 1 θ , for all ( u 1 , , u d ) [ 0 , 1 ] d
It is an Archimedean copula whose inverse generator function is defined, for all t > 0 , by
ψ θ ( t ) = 1 + θ t 1 θ , for all θ 0 , .
Likewise, the d-dimensional Frank family copula is expressed, for all ( u 1 , , u d ) [ 0 , 1 ] d , by:
C ϕ ( u 1 , , u d ) = 1 θ ln 1 + ( e θ u 1 1 ) ( e θ u 2 1 ) ( e θ u d 1 ) ( e θ 1 ) d 1 , θ > 0 .
Its inverse generator function is given, for all t > 0 , by
ψ θ ( t ) = ln e θ t 1 e θ 1 for all ( u 1 , , u d ) [ 0 , 1 ] d ) , θ > 0 .
For d 3 , the Frank copula describes only the positive dependence, whereas in the two-dimensional case, this copula models both positive and negative association.

2.2. Gaussian Copulas

The d-Gaussian copula C Σ is defined through the standardized d-multivariate normal distribution Φ Σ . The correlation matrix Σ represents the dependence parameters of this copula. Specifically, C Σ is expressed by
C Σ ( u 1 , , u d ) = Φ Σ Φ 1 ( u 1 ) , , Φ 1 ( u d ) for all ( u 1 , , u d ) [ 0 , 1 ] d ) ,
where Φ denotes the standard normal distribution. In other words, the multivariate Gaussian copula is explicitly given by,
C Σ ( u 1 , , u d ) = Φ 1 ( u 1 ) Φ 1 ( u d ) 1 2 π det ( Σ ) exp 1 2 x ( Σ 1 I ) x d x ,
where I is the identical matrix. The bivariate Gaussian copula is reduced to
C ρ ( u 1 , u 2 ) = Φ 1 ( u 1 ) Φ 1 ( u 2 ) 1 2 π 1 ρ 2 exp x 2 2 ρ x + y 2 2 ( 1 ρ 2 ) d x d y ,
where ρ represents the Pearson correlation coefficient, a parameter within the range [ 1 , 1 ] , serving as the dependence parameter for this copula.

3. Model Description

Starting from a random vector ( Y , X ) , where Y is a continuous random variable with cumulative distribution function F 0 , assume that X = X q , X d q , where the random vectors X q = ( X 1 , , X q ) and X d q = ( X q + 1 , , X d ) are continuous and discrete, respectively, and suppose without loss of generality that, for any i = q + 1 , , d , Ran ( X i ) N = { 0 , 1 , 2 , } . Denote by F 1 , , F d the distribution functions of X 1 , , X d , respectively, and, for all x d R q × Π q + 1 d Ran ( X i ) , let F d ( x ) = F 1 ( x 1 ) , , F d ( x d ) . Let C and C X be the copulas of ( Y , X ) and X , respectively. Moreover, for ( u , v d ) = ( u , v 1 , , v d ) ( 0 , 1 ) d + 1 , set,
q + 1 C ( u , v d ) = q + 1 u v 1 v q C ( u , v d ) and q C X ( v d ) = q v 1 v q C X ( v d ) .
For i = q + 1 , , d , set δ i = ( δ i , 1 , , δ i , d ) such that δ i , i = 1 and δ i , j = 0 , for i j . For i = q + 1 , , d , let Δ i be the forward difference operator defined by
Δ i q + 1 C ( F 0 ( k ) , F d ( x d ) ) = q + 1 C ( F 0 ( k ) , F d ( x d ) ) q + 1 C ( F 0 ( k ) , F d ( x d δ i ) ) , Δ i q C X ( F d ( x d ) ) = q C X ( F d ( x d ) ) q C X ( F d ( x d δ i ) ) .
and set Δ q d = i = q + 1 d Δ i .
Proposition 1.
For all x R q × Π q + 1 d Ran ( X i ) , the conditional mean, of Y given X = x is expressed by,
m ( x ) = E Y Δ q d q + 1 C ( F 0 ( Y ) , F d ( x ) ) Δ q d q C X F d ( x ) .
Proof. 
For all x R q × Π q + 1 d Ran ( X i ) , let f . | x be the conditional density function of Y given X = x . Clearly, one has
m ( x ) = E Y | X = x = y f y | x d y = y Δ q d q + 1 C F 0 ( y ) , F d ( x ) f 0 ( y ) f 1 ( x 1 ) f q ( x q ) Δ q d q C X F d ( x ) f 1 ( x 1 ) f q ( x q ) d y = y Δ q d q + 1 C F 0 ( y ) , F d ( x ) f 0 ( y ) Δ q d q C X F d ( x ) d y = E Y Δ q d q + 1 C ( F 0 ( Y ) , F d ( x ) Δ q d q C X F d ( x ) .
Remark 1.
A developed expression of the conditional mean can be obtained by using the expanded expression of Δ q d described by
Δ q d = 1 + S { 1 , , d q } ( 1 ) | S | i S T i ,
where T i = 1 Δ i and | S | denotes the cardinality number of any nonempty subset S of { 1 , , d q } . Therefore, one sees from (7) and (8) that m ( x ) can expressed by,
E Y q + 1 C ( F 0 ( Y ) , F d ( x d ) + S { 1 , , d q } ( 1 ) | S | E Y i S T i q + 1 C ( F 0 ( Y ) , F d ( x d ) q C X F d ( x ) + S { 1 , , d q } ( 1 ) | S | i S T i q C X F d ( x ) ,
where
T i q + 1 C ( F 0 ( k ) , F d ( x d ) ) = q + 1 C ( F 0 ( k ) , F d ( x d δ i ) ) , T i q C X ( F d ( x d ) ) = q C X ( F d ( x d δ i ) ) .

3.1. Archimedean Copula-Based Predicted Mean

Suppose that the dependence structure of ( Y , X ) is described by an Archimedean class of copulas C with generator ϕ . This means the copulas C and C X are expressed, for all ( u , v d ) ( 0 , 1 ) d + 1 , by
C ( u , v d ) = ψ ϕ ( u ) + j = 1 d ϕ v j and C X ( v d ) = ψ j = 1 d ϕ v j ,
where the function ψ represents the inverse of the generator ϕ . Therefore, the partial derivative of the copulas C and C X are given by
q + 1 C ( u , v d ) = ψ ( q + 1 ) ϕ ( u ) + j = 1 d ϕ v j ϕ ( u ) j = 1 q ϕ ( v j ) ,
and
q C X ( v d ) = ψ ( q ) j = 1 d ϕ v j j = 1 q ϕ ( v j ) .
Hence, the regression curve is given by
m ( x ) = E Y ϕ ( F 0 ( Y ) ) Δ q d ψ ( q + 1 ) ϕ ( F 0 ( Y ) ) + j = 1 d ϕ F j ( x j Δ q d ψ ( q ) j = 1 d ϕ F j ( x j .
To exemplify the above conditional mean, let us examine the scenario where d = 2 and q = 1 , implying that covariate X 1 is continuous, while covariate X 2 is discrete. In such instances, we have:
m ( x ) = E Y ϕ ( F 0 ( Y ) ) Δ 1 2 ψ ϕ ( F 0 ( Y ) ) + ϕ F 1 ( x 1 ) + ϕ F 2 ( x 2 ) Δ 1 2 ψ ϕ F 1 ( x 1 ) + ϕ F 2 ( x 2 ) ,
where
Δ 1 2 ψ ϕ ( F 0 ( Y ) + ϕ F 1 ( x 1 ) + ϕ F 2 ( x 2 ) = ψ ϕ ( F 0 ( Y ) ) + ϕ F 1 ( x 1 ) + ϕ F 2 ( x 2 ) ψ ϕ ( F 0 ( Y ) ) + ϕ F 1 ( x 1 ) + ϕ F 2 ( x 2 1 ) ,
and
Δ 1 2 ψ ϕ F 1 ( x 1 ) + ϕ F 2 ( x 2 ) = ψ ϕ F 1 ( x 1 ) + ϕ F 2 ( x 2 ) ψ ϕ F 1 ( x 1 ) + ϕ F 2 ( x 2 1 ) .
Example 1.
Illustrating Equation (11), let us assume that ( Y , X 1 , X 2 ) follows the Clayton copula C θ as described in (2). Specifically, for all θ ( 0 , ) ,
C θ ( u , v 1 , v 2 ) = u θ + v 1 θ + v 2 θ 2 1 θ , for all ( u , v 1 , v 2 ) [ 0 , 1 ] 3 .
The generator ϕ θ of this copula and its inverse ψ θ given in (3) satisfy
ψ θ ( t ) = t θ 1 θ , ψ θ ( t ) = 1 + θ t 1 θ 1 , ψ θ ( t ) = ( θ + 1 ) 1 + θ t 1 θ 2 .
Hence, standard calculations show that (11) reduces to
m ( x ) = ( 1 + θ ) E Y F 0 ( Y ) θ 1 Δ 1 2 F 0 ( Y ) θ + F 1 ( x 1 ) θ + F 2 ( x 2 ) θ 2 1 θ 2 Δ 1 2 F 1 ( x 1 ) θ + F 2 ( x 2 ) θ 1 1 θ 1 .
Likewise, let us express Equation (11) when ( Y , X 1 , X 2 ) follows the Frank copula C θ expressed in (4), namely,
C ϕ ( u , v 1 , v 2 ) = 1 θ ln 1 + ( e θ u 1 ) ( e θ v 1 1 ) ( e θ v 2 1 ) ( e θ 1 ) 2 , θ R 0 .
Calculations similar to those used previously lead to
m ( x ) = E Y Δ 1 2 K 1 , θ ( F 0 ( Y ) , F 1 ( x 1 ) , F 2 ( x 2 ) ) Δ 1 2 K 2 , θ ( F 1 ( x 1 ) , F 2 ( x 2 ) ) ,
where
K 1 , θ ( u , v 1 , v 2 ) = e θ u e θ v 1 ( e θ v 2 1 ) ( 1 e θ ) 2 + e θ u ( e θ v 1 1 ) ( e θ v 2 1 ) ( 1 e θ ) 2 + ( e θ u + 1 ) ( e θ v 1 1 ) ( e θ v 2 1 ) 2 ,
and
K 2 , θ ( v 1 , v 2 ) = e θ v 1 ( e θ v 2 1 ) 1 e θ + ( e θ v 1 1 ) ( e θ v 2 1 ) .

3.2. Gaussian Copula-Based Predicted Mean

This section presents the expression of the regression curve when the copula C of ( Y , X ) is Gaussian. This means that the copula C is expressed in terms of the standardized ( d + 1 ) -multivariate normal distribution Φ Σ and the correlation matrix Σ , which is assumed to be non-singular, as follows.
C ( u , v ; Σ ) = Φ Σ Φ 1 ( u ) , Φ 1 ( v ) ( u , v ) [ 0 , 1 ] d + 1 ,
where Φ denotes the standard normal distribution and where we note that
Φ 1 ( v ) = ( Φ 1 ( v 1 ) , Φ 1 ( v 2 ) , , Φ 1 ( v d ) ) .
To derive q + 1 C ( u , v d ) and q C X ( v d ) , let us first decompose the correlation matrix Σ as follows.
Σ = Σ 11 Σ 12 Σ 21 Σ 22 , with sizes ( q + 1 ) × ( q + 1 ) ( q + 1 ) × ( d q ) ( d q ) × ( q + 1 ) ( d q ) × ( d q ) ,
where Σ 11 and Σ 22 represent the correlation matrices of the ( q + 1 ) -continuous random vector ( Y , X q ) and the ( d q ) -discrete random vector X d q , respectively. Furthermore, Σ 12 denotes the correlation matrix between the random vectors X d q and ( Y , X q ) and Σ 21 = Σ 12 .
Consider the ( d + 1 ) -uniform random vector ( U , V 1 , , V d ) with distribution C, and set V q = ( V 1 , , V q ) and V d q = ( V q + 1 , , V d ) . Therefore, one observes, for all ( u , v q , v d q ) ( 0 , 1 ) × ( 0 , 1 ) q × ( 0 , 1 ) d q ,
q + 1 C ( u , v d ) = P V d q v d q U = u , V q = v q c q + 1 ( u , v q ) ,
where c q + 1 is the copula density of ( U , V q ) . Let Φ 1 ( V d q ) = ( Φ 1 ( V 1 ) , , Φ 1 ( V d q ) ) and Φ 1 ( u , v q ) = ( Φ 1 ( u ) , Φ 1 ( v 1 ) , , Φ 1 ( q ) ) . Since ( Φ 1 ( U ) , Φ 1 ( V 1 ) , , Φ 1 ( V d ) ) is normally distributed, then the conditional random vector Φ 1 ( V d q ) U = u , V q = v q is distributed as
N Σ 21 Σ 11 1 Φ 1 ( u , v q ) , Σ 22 Σ 21 Σ 11 1 Σ 21 ,
with distribution function G q + 1 ( · Φ 1 ( u , v q ) ) . It follows that
q + 1 C ( u , v d ) = G q + 1 ( Φ 1 ( V d q ) Φ 1 ( u , v q ) ) c q + 1 ( u , v q ) ,
where c q + 1 ( u , v q ) is the copula density of the random vector ( U , V q ) . It remains to derive q C X ( v d ) , where C X is the Gaussian copula X with correlation matrix
Σ ˜ = Σ ˜ 11 Σ ˜ 12 Σ ˜ 21 Σ 22 , with sizes q × q q × ( d q ) ( d q ) × q ( d q ) × ( d q ) ,
where Σ ˜ 11 and Σ 22 represent the correlation matrices of the q-continuous random vector X q and the ( d q ) -discrete random vector X d q , respectively. Similarly, Σ ˜ 12 denotes the correlation matrix between the random vectors X d q and X q and Σ ˜ 21 = Σ ˜ 12 . It follows that, for all ( v q , v d q ) × ( 0 , 1 ) q × ( 0 , 1 ) d q ,
q C X ( v d ) = P V d q v d q , V q = v q c q ( v q ) ,
where c q is the copula density of the random vector V q . Using the fact that the conditional random vector Φ 1 ( V d q ) V q = v q is distributed as
N Σ ˜ 21 Σ ˜ 11 1 Φ 1 ( v q ) , Σ 22 Σ ˜ 21 Σ ˜ 11 1 Σ ˜ 21 ,
with distribution function G q ( · Φ 1 ( v q ) ) . It follows that
q C X ( v d ) = G q ( Φ 1 ( V d q ) Φ 1 ( v q ) ) c q ( v q ) ,
Therefore, the predicted mean is given by
m ( x ) = E Y c q + 1 ( F 0 ( Y ) , F q ( x q ) ) Δ q d G q + 1 ( Φ 1 ( F d q ( x d q ) ) Φ 1 ( F 0 ( Y ) , F q ( x q ) ) c q ( F q ( x q ) ) Δ q d G q ( Φ 1 ( F d q ( x d q ) ) Φ 1 ( F q ( x q ) ) .
Example 2.
Consider the case d = 2 and q = 1 . This means that the covariate X 1 is continuous and the covariate X 2 is discrete. Assume further that the copula of ( Y , X 1 , X 2 ) is Gaussian with the correlation matrix
Σ = 1 ρ 12 ρ 13 ρ 12 1 ρ 23 ρ 13 ρ 23 1 .
In such a case, we have,
m ( x ) = E Y c 2 ( F 0 ( Y ) , F 1 ( x 1 ) ) Δ 1 2 G 2 ( Φ 1 ( F 2 ( x 2 ) ) Φ 1 ( F 0 ( Y ) , F 1 ( x 1 ) ) c 1 ( F 1 ( x 1 ) ) Δ 1 2 G 1 ( Φ 1 ( F 2 ( x 2 ) ) Φ 1 ( F 1 ( x 1 ) ) ) ,
where c 1 ( F 1 ( x 1 ) ) = 1 . It remains to calculate the copula density c 2 of ( U , V 1 ) and the conditional normal distribution G 1 ( · Φ 1 ( F 1 ( x 1 ) ) and G 2 ( · Φ 1 ( F 0 ( Y ) , F 1 ( x 1 ) ) . Since the copula of ( U , V 1 ) is Gaussian with correlation matrix, Σ 11 = 1 ρ 12 ρ 12 1 . Then, the copula density c 2 ( F 0 ( Y ) , F 1 ( x 1 ) ) is given by
1 1 ρ 12 2 exp 2 ρ 12 Φ 1 ( F 0 ( Y ) ) Φ 1 ( F 1 ( x 1 ) ) ρ 12 2 Φ 1 ( F 0 ( Y ) ) 2 + Φ 1 ( F 1 ( x 1 ) ) 2 2 ( 1 ρ 12 2 ) .
Also, we have
Σ 12 = Σ 21 = ( ρ 13 , ρ 23 ) , Σ 22 = 1 , Σ ˜ 11 = 1 , Σ ˜ 12 = Σ ˜ 21 = ρ 23 .
Standard calculations show that, from this, G 2 ( · Φ 1 ( F 0 ( Y ) , F 1 ( x 1 ) ) ) is the distribution of N ( μ ( Y , x 1 ) , σ 2 ) , such that
μ ( Y , x 1 ) = ( ρ 13 ρ 12 ρ 23 ) Φ 1 ( F 0 ( Y ) ) + ( ρ 23 ρ 12 ρ 13 ) Φ 1 ( F 1 ( x 1 ) ) 1 ρ 12 2 ,
and
σ 2 = 1 ρ 12 2 ρ 13 2 ρ 23 2 + 2 ρ 12 ρ 13 ρ 23 1 ρ 12 2 .
Likewise, G 1 ( · Φ 1 ( F 1 ( x 1 ) ) is the distribution of N ( μ ˜ ( x 1 ) , σ ˜ 2 ) , such that
μ ˜ ( x 1 ) = ρ 23 Φ 1 ( F 1 ( x 1 ) ) and σ ˜ 2 = 1 ρ 23 2 .
m ( x ) = E Y c 2 ( F 0 ( Y ) , F 1 ( x 1 ) ) Δ 1 2 Φ σ 1 [ Φ 1 ( F 2 ( x 2 ) μ ( Y , x 1 ) ] Δ 1 2 Φ σ ˜ 1 [ Φ 1 ( F 2 ( x 2 ) μ ˜ ( x 1 ) ] ) ,
where
Δ 1 2 Φ σ 1 [ Φ 1 ( F 2 ( x 2 ) μ ( Y , x 1 ) ] = Φ σ 1 [ Φ 1 ( F 2 ( x 2 ) μ ( Y , x 1 ) ] Φ σ 1 [ Φ 1 ( F 2 ( x 2 1 ) μ ( Y , x 1 ) ] ,
and
Δ 1 2 Φ σ ˜ 1 [ Φ 1 ( F 2 ( x 2 ) μ ˜ ( x 1 ) ] = Φ σ ˜ 1 [ Φ 1 ( F 2 ( x 2 ) μ ˜ ( x 1 ) ] Φ σ ˜ 1 [ Φ 1 ( F 2 ( x 2 1 ) μ ˜ ( x 1 ) ] .
Example 3.
As a continuation of Example 2, in order to give a closed form for the conditional mean m ( y | x ) , we consider the case where the variables Y and X 1 are distributed as standard normal and where the correlation matrix of the Gaussian copula is determined by ρ 12 = 0 and ρ 13 = ρ 23 = 0.6 . Thus, the conditional expectation is given by:
m ( x ) = E Y Φ 1.89 [ Φ 1 ( F 2 ( x 2 ) 0.6 Y 0.6 x 1 ] Φ 1.89 [ Φ 1 ( F 2 ( x 2 1 ) 0.6 Y 0.6 x 1 ] Φ 1.25 [ Φ 1 ( F 2 ( x 2 ) 0.6 x 1 ) ] Φ 1.25 [ Φ 1 ( F 2 ( x 2 1 ) 0.6 x 1 ) ] ,
for any discrete random variables X 2 and for any ( y , x 1 , x 2 ) R 2 × Ran ( X 2 ) .

4. Estimation

Consider a sample of n observations ( Y 1 , X 1 ) , , ( Y n , X n ) from the random vector ( Y , X ) . For i = 1 , , n , denote X i = ( X i 1 , , X i d ) . To estimate the conditional mean described in (7), we need to estimate the marginal distributions F 0 , F 1 , , F d as well as the partial derivatives of the copulas C and C X , namely, q + 1 C and q C X , provided in (6). In this paper, we use a semi-parametric methodology that first consists of estimating margins F 0 , F 1 , , F d through their rescaled empirical distributions given by
F n 0 ( y ) = 1 n + 1 i = 1 n I ( Y y ) and F n j ( x ) = 1 n + 1 i = 1 n I ( X i j x ) , j = 1 , , d ,
respectively, where I A stands for the indicator function for given event A. An alternative method for estimating these quantities is through the use of the kernel smoothing technique. Typically, this approach yields more accurate estimations compared to the method based on empirical distributions. The idea behind this method is to estimate the distributions F 0 , F 1 , , F d using
F ^ 0 ( y ) = 1 n + 1 i = 1 n K y Y i h and F ^ j ( x ) = 1 n + 1 i = 1 n K x X i j h , j = 1 , , d .
K ( · ) represents a non-negative function known as the kernel, while h signifies the bandwidth. It is well known that the selection of the bandwidth is crucial and significantly influences the accuracy of the estimation.
The second step is to estimate parametrically the copula of ( Y , X ) . To this end, assume that the copula C is identified as a member of some parametric family, F = C θ , θ Θ , where Θ R . This means that there exists θ 0 Θ such that C = C θ 0 and C X = C X , θ 0 . Therefore, the copula C is then estimated by C θ ^ , where θ ^ is an estimator of θ 0 . This estimator is typically obtained by maximizing, in terms of θ , the expressed pseudo-likelihood function,
L ( θ ) = i = 1 n ln c θ F ^ 0 ( Y i ) , F ^ d ( X i )
where F ^ d ( X i ) = ( F ^ 1 ( X i 1 ) , , F ^ d ( X i d ) ) . In other words, the estimator of the θ is given by
θ ^ = θ Θ L ( θ ) .
Finally, the conditional mean m ( x ) is estimated using (7) as follows:
m ^ ( x ) = 1 n i = 1 n Y i Δ q d q + 1 C θ ^ ( F ^ 0 ( Y i ) , F ^ d ( x ) ) Δ q d q C X , θ ^ F ^ d ( x ) .
Example 4.
Let us examine the above estimation procedure in the scenario where d = 2 and q = 1 , the situation entails X 1 being a continuous covariate, while X 2 is discrete. Additionally, let us assume that the copula governing ( Y , X 1 , X 2 ) is the Clayton copula with parameter θ ( 0 , ) . The theoretical conditional mean m ^ ( x ) is provided in (12). Its estimated couterpart can be derived from (19) as follows:
m ^ ( x ) = ( 1 + θ ^ ) i = 1 n Y i F ^ 0 ( Y i ) θ ^ 1 Δ 1 2 F ^ 0 ( Y i ) θ ^ + F ^ 1 ( x 1 ) θ ^ + F ^ 2 ( x 2 ) θ ^ 2 1 θ ^ 2 n Δ 1 2 F ^ 1 ( x 1 ) θ ^ + F ^ 2 ( x 2 ) θ ^ 1 1 θ ^ 1 .
The estimators θ ^ , F ^ 0 , F ^ 1 , and F ^ 2 can be computed using a sample ( Y i , X 1 i , X 2 i ) , i = 1 , , n , selected from the distribution of ( Y , X 1 , X 2 ) . Similarly, in the case where the dependence structure of ( Y , X 1 , X 2 ) is modeled by a Frank copula, the estimated conditional mean m ^ ( x ) can be derived from (13) and (19) as follows:
m ^ ( x ) = 1 Δ 1 2 K 2 , θ ^ ( F ^ 1 ( x 1 ) , F ^ 2 ( x 2 ) ) 1 n i = 1 n Y i Δ 1 2 K 1 , θ ^ ( F ^ 0 ( Y i ) , F ^ 1 ( x 1 ) , F ^ 2 ( x 2 ) ) ,
K 1 , θ and K 2 , θ are given in (14) and (15), respectively.

5. Simulation Study

The objective of this section is to conduct simulations to compare the proposed conditional mean estimator with some competitors. To achieve this, we focus on the case where d = 2 with mixed covariates; specifically, X 1 is continuous, and X 2 is discrete. In this case, the proposed estimator is deduced from its general form expressed in (19) as follows,
m ^ ( x 1 , x 2 ) = 1 n i = 1 n Y i 2 C θ ^ ( F ^ 0 ( Y i ) , F ^ 1 ( x 1 ) , F ^ 2 ( x 2 ) ) 2 C θ ^ ( F ^ 0 ( Y i ) , F ^ 1 ( x 1 ) , F ^ 2 ( x 2 1 ) ) 1 C X , θ ^ F ^ 1 ( x 1 ) , F ^ 2 ( x 2 ) 1 C X , θ ^ F ^ 1 ( x 1 ) , F ^ 2 ( x 2 1 ) ,
where
2 C θ ^ ( u , v 1 , v 2 ) = 2 C θ ^ ( u , v 1 , v 2 ) u v 1 and 1 C X , θ ^ ( v 1 , v 2 ) = C X , θ ^ ( v 1 , v 2 ) v 1 .
As scenarios, we consider the most common cases to show the improvement of our estimator over the OLS estimator. However, for the copula of ( Y , X 1 , X 2 ) , we consider Clayton, Frank, and Gumbel with parameter θ ( 0 , ) and for the variables Y N ( μ , σ 2 ) or Y t ( d f ) , while X 1 U [ a , b ] and X 2 { 1 , 2 , 3 } with distribution F 2 ( 1 ) = p 1 , F 2 ( 2 ) = p 1 + p 2 and F 2 ( 3 ) = 1 . The generalized inverse of F 2 is
F 2 1 ( t ) = inf v 1 , 2 , 3 : F 2 ( v ) > t ,
or equivalently,
F 2 1 ( t ) = I ( 0 t < p 1 ) + 2 I ( p 1 t < p 1 + p 2 ) + 3 I ( p 1 + p 2 t 1 ) .
Simulation algorithm:
  • Given n , a , b , p 1 , p 2 , μ , σ 2 , and θ .
  • For i = 1 , , n .
  • Generate u i , v i , 1 , v i , 2 from a copula C θ .
  • Set y i = F 0 1 ( u i ) , x i , 1 = F 1 1 ( v i , 1 ) and x i , 2 = F 2 1 ( v i , 2 ) .
  • Use the generated sample ( y i , x i , 1 , x i , 2 ) , i = 1 , , n to estimate θ and define the empirical distributions of F 0 , F 1 , and F 2 .
  • Evaluate the estimator m ^ ( x 1 , x 2 ) for ( x 1 , x 2 ) belonging to the grid defined by
    F = E × { 1 , 2 , 3 } where E = { i / K , i = 1 , , K } .
For fixed ( x 1 , x 2 ) F , we first compute the theoretical value m ( x 1 , x 2 ) and then evaluate m ^ ( x 1 , x 2 ) using J random samples of size n. We denote the corresponding estimates by m ^ ( j ) ( x 1 , x 2 ) , where j = 1 , , J . To assess the performance, we employ the empirical integrated mean squared error (IMSE), which is formulated as follows:
IMSE = 1 J j = 1 J 1 | F | ( x 1 , x 2 ) F ( m ^ ( j ) ( x 1 , x 2 ) m ( x 1 , x 2 ) ) 2 ,
where | F | denotes the cardinality number of the grid F. Notably, IMSE can be decomposed into the square of empirical bias, IBIAS 2 , and the empirical variance, IVAR , as follows:
IMSE = IBIAS 2 + IVAR .
In this simulation study, different values of the parameters are considered, which represent different dependence scenarios ranging from weak to strong, with Kendall’s tau, τ , values lying in the interval ( 0.2 , 0.75 ) . With a sample size n = 200 , the response, Y, is generated from N ( 0 , 1 ) distribution and Student’s t-distribution with 3 degrees of freedom. Also, X 1 is generated from a Uniform(0, 1) and X 2 { 1 , 2 , 3 } with distribution F 2 ( 1 ) = p 1 , F 2 ( 2 ) = p 1 + p 2 and F 2 ( 3 ) = 1 , where p 1 = 0.3 and p 2 = 0.5 . In this context, we report and compare the integrated mean square error (IMSE) and the integrated mean absolute error (IMAE) with the respective errors derived from the least squares (ls) regression method. This comprehensive approach ensured the reliability of the comparison by accounting for variability in outcomes across multiple realizations. The reported values in Table 1 and Table 2, corresponding to normal distribution and Student’s t-distribution, respectively, represent the averages calculated from a total of 100 realizations. The results show that the proposed method consistently outperformed the least squares regression method across all the scenarios. This dominance was evident in both metrics, IMSE and IMAE, and across all varieties of Kendall’s tau values and sample sizes. We also analyzed the evolution of MSE with sample size, confirming a clear reduction as n grows, improving estimator accuracy and stability (see Table 3). Specifically, we considered n = 50 and n = 100 , which are relatively small. As n increases, the estimator improves significantly in terms of IMSE.
Particularly, the proposed method revealed a more accurate and robust performance, indicating a lower IMSE and IMAE between the estimated and actual values than the least squares method. This enhanced performance can be attributed to the proposed method’s ability to more effectively capture and account for the underlying correlation structure represented by Kendall’s tau in the data. Unlike the least squares method, which assumes a specific form of relationship (linear), the proposed method offers a more flexible and robust approach to analyzing data with varying degrees of correlation and complexity.

6. Conclusions

This paper extends the copula-based regression model introduced by Noh et al. [5] by addressing the scenario where covariates are mixed, encompassing both continuous and discrete explanatory variables. Unlike the original model, which dealt exclusively with continuous covariates, the proposed approach broadens the applicability of the copula-based regression framework. The parameter estimation has been performed using the inference function for margins (IFM), which first estimates the marginal parameters and then estimates the corresponding dependence parameter. Through detailed examples, we demonstrated the estimation of the proposed regression equation and conducted a comprehensive simulation study under various scenarios involving different types of copulas. The results of the simulation study indicate that the suggested model performs favorably compared to classical regression approaches, showcasing its potential to handle mixed-covariate data effectively. This extension provides a valuable contribution to the field of regression analysis, offering a new regression tool for researchers and practitioners dealing with diverse explanatory data types. An interesting potential research direction involves extending this concept to regression with multivariate responses using the same mixed covariates. This extension is particularly relevant in various practical applications where multiple outcomes need to be modeled simultaneously. For instance, in environmental studies, multivariate regression can be used to assess how industrial emissions simultaneously impact both air and water quality, accounting for the complex interactions between pollutants. In healthcare, it enables researchers to examine how lifestyle factors influence multiple health outcomes, such as blood pressure, cholesterol levels, and blood sugar levels.

Author Contributions

Writing—review & editing, S.A., O.K. and M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by Grant CARP2023 from the College of Business and Economics at UAE University, for the Promotion of research.

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

We would like to express our sincere thanks to the anonymous referees for their constructive comments and suggestions, which improved the earlier version of our paper. We are also very grateful to UAE University Research Affairs for funding the APC.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Nelsen, R.B. An Introduction to Copulas; Springer Series in Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
  2. Sheikhi, A.; Arad, F.; Mesiar, R. A heteroscedasticity diagnostic of a regression analysis with copula dependent random variables. Braz. J. Probab. Stat. 2022, 36, 408–419. [Google Scholar] [CrossRef]
  3. Ali, A.; Pathak, A.K.; Arshad, M.; Emura, T. Copula-based regression estimation in the presence of outliers. Commun. Stat. Simul. Comput. 2024, 1–26. [Google Scholar] [CrossRef]
  4. Sklar, M. Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 1959, 8, 229–231. [Google Scholar]
  5. Noh, H.; Ghouch, A.E.; Bouezmarni, T. Copula-based regression estimation and inference. J. Am. Stat. Assoc. 2013, 108, 676–688. [Google Scholar] [CrossRef]
  6. Noh, H.; Ghouch, A.E.; Van Keilegom, I. Semiparametric conditional quantile estimation through copula-based multivariate models. J. Bus. Econ. Stat. 2015, 33, 167–178. [Google Scholar] [CrossRef]
  7. De Backer, M.; El Ghouch, A.; Van Keilegom, I. Semiparametric copula quantile regression for complete or censored data. Electron. J. Stat. 2017, 11, 1660–1698. [Google Scholar] [CrossRef]
  8. Kraus, D.; Czado, C. D-vine copula based quantile regression. Comput. Stat. Data Anal. 2017, 110, 1–18. [Google Scholar] [CrossRef]
  9. Rémillard, B.; Nasri, B.; Bouezmarni, T. On copula-based conditional quantile estimators. Stat. Probab. Lett. 2017, 128, 14–20. [Google Scholar] [CrossRef]
  10. Chang, B.; Joe, H. Prediction based on conditional distributions of vine copulas. Comput. Stat. Data Anal. 2019, 139, 45–63. [Google Scholar] [CrossRef]
  11. Nagler, T.; Vatter, T. Solving estimating equations with copulas. J. Am. Stat. Assoc. 2023, 119, 1168–1180. [Google Scholar] [CrossRef]
  12. Newey, W.K.; Powell, J.L. Asymmetric least squares estimation and testing. Econometrica 1987, 55, 819–847. [Google Scholar] [CrossRef]
  13. Coia, V.; Joe, H.; Nolde, N. Copula-based conditional tail indices. J. Multivar. Anal. 2023, 201, 105268. [Google Scholar] [CrossRef]
  14. Mesfioui, M.; Bouezmarni, T.; Belalia, M. Copula-based link functions in binary regression models. Stat. Pap. 2023, 64, 557–585. [Google Scholar] [CrossRef]
  15. Smith, M.S. Implicit copulas: An overview. Econom. Stat. 2023, 28, 81–104. [Google Scholar] [CrossRef]
  16. Hans, N.; Klein, N.; Faschingbauer, F.; Schneider, M.; Mayr, A. Boosting distributional copula regression. Biometrics 2023, 79, 2298–2310. [Google Scholar] [CrossRef] [PubMed]
  17. Nazeri Tahroudi, M.; Ramezani, Y.; De Michele, C.; Mirabbasi, R. Application of copula-based approach as a new data-driven model for downscaling the mean daily temperature. Int. J. Climatol. 2023, 43, 240–254. [Google Scholar] [CrossRef]
  18. McNeil, A.J.; Nešlehová, J. Multivariate Archimedean copulas, d-monotone functions and 1-norm symmetric distributions. Ann. Statist. 2009, 37, 3059–3097. [Google Scholar] [CrossRef] [PubMed]
Table 1. Simulation results for normal distribution with n = 200 .
Table 1. Simulation results for normal distribution with n = 200 .
IMSEIMAE
CopulaParameters m c ^ m ls ^ m c ^ m ls ^
Clayton         θ = 0.50 ( τ = 0.20 )        0.0050.0150.0840.071
θ = 2.00 ( τ = 0.50 )        0.0120.0330.1160.104
θ = 6.00 ( τ = 0.75 )        0.0320.0950.1470.136
Frank θ = 2.37 ( τ = 0.25 )        0.000260.1200.0110.283
θ = 5.73 ( τ = 0.50 )        0.000040.3050.0040.459
θ = 14.14 ( τ = 0.75 )        0.000010.6130.0020.670
Gumbel θ = 1.25 ( τ = 0.20 )        0.0200.0500.1050.151
θ = 2.00 ( τ = 0.50 )        0.0360.0950.1340.208
θ = 4.00 ( τ = 0.75 )        0.0620.2320.1440.285
Gausian ρ 12 = 0.4 , ρ 13 = 0.4 , ρ 23 = 0.4        0.0790.0800.2360.234
( τ = 0.26 )        
ρ 12 = 0.9 , ρ 13 = 0.9 , ρ 23 = 0.85        0.0510.0710.1720.210
( τ = 0.69 )        
Table 2. Simulation results for Student’s t with d f = 3 and n = 200 .
Table 2. Simulation results for Student’s t with d f = 3 and n = 200 .
IMSEIMAE
CopulaParameters m c ^ m ls ^ m c ^ m ls ^
Clayton         θ = 0.50 ( τ = 0.20 )        0.0110.0440.1210.121
θ = 2.00 ( τ = 0.50 )        0.0220.0790.1570.158
θ = 6.00 ( τ = 0.75 )        0.0610.2910.2060.247
Frank θ = 2.37 ( τ = 0.25 )        0.001120.2700.0230.420
θ = 5.73 ( τ = 0.50 )        0.000240.6730.0100.679
θ = 14.14 ( τ = 0.75 )        0.000161.5560.0081.059
Gumbel θ = 1.25 ( τ = 0.20 )        0.0510.1380.1570.246
θ = 2.00 ( τ = 0.50 )        0.0760.2640.1990.352
θ = 4.00 ( τ = 0.75 )        0.1200.7090.1940.576
Gausian ρ 12 = 0.4 , ρ 13 = 0.4 , ρ 23 = 0.4        0.4000.5250.4310.448
( τ = 0.26 )        
ρ 12 = 0.9 , ρ 13 = 0.9 , ρ 23 = 0.85        0.0870.3010.2200.435
( τ = 0.69 )        
Table 3. Simulation results for normal distribution with n = 50 and n = 100 .
Table 3. Simulation results for normal distribution with n = 50 and n = 100 .
IMSE n = 50 n = 100
CopulaParameters m c ^ m ls ^ m c ^ m ls ^
Clayton         θ = 0.50 ( τ = 0.20 )        0.0310.0790.0140.042
θ = 2.00 ( τ = 0.50 )        0.0470.1030.0310.080
θ = 6.00 ( τ = 0.75 )        0.0780.2570.0700.199
Frank θ = 2.37 ( τ = 0.25 )        0.0010.1590.00060.138
θ = 5.73 ( τ = 0.50 )        0.000170.3710.000090.306
θ = 14.14 ( τ = 0.75 )        0.000060.6610.000020.598
Gumbel θ = 1.25 ( τ = 0.20 )        0.0270.0880.0260.067
θ = 2.00 ( τ = 0.50 )        0.0590.1270.0430.103
θ = 4.00 ( τ = 0.75 )        0.0850.2690.0640.237
Gausian ρ 12 = 0.4 , ρ 13 = 0.4 , ρ 23 = 0.4        0.1270.1130.1010.095
( τ = 0.26 )        
ρ 12 = 0.9 , ρ 13 = 0.9 , ρ 23 = 0.85        0.3620.0910.1010.078
( τ = 0.69 )        
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Aldahmani, S.; Kortbi, O.; Mesfioui, M. Copula-Based Regression with Mixed Covariates. Mathematics 2024, 12, 3525. https://doi.org/10.3390/math12223525

AMA Style

Aldahmani S, Kortbi O, Mesfioui M. Copula-Based Regression with Mixed Covariates. Mathematics. 2024; 12(22):3525. https://doi.org/10.3390/math12223525

Chicago/Turabian Style

Aldahmani, Saeed, Othmane Kortbi, and Mhamed Mesfioui. 2024. "Copula-Based Regression with Mixed Covariates" Mathematics 12, no. 22: 3525. https://doi.org/10.3390/math12223525

APA Style

Aldahmani, S., Kortbi, O., & Mesfioui, M. (2024). Copula-Based Regression with Mixed Covariates. Mathematics, 12(22), 3525. https://doi.org/10.3390/math12223525

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop