1. Introduction
We consider the PFMRM which includes some scalar covariates and a functional predictor, paired with a scalar response. That is,
where
Y is a positive response variable;
is a
p-dimensional vector covariate,
is the
p-vector of slope coefficients, in which
p is assumed to be fixed;
is the unknown slope function associated with functional predictor
; and
is a positive random error independent of
and
. Here, the Hilbert space
is the set of all square integrable functions on
T, endowed with the inner product
and the norm
. The model (
1) generalizes both the classic multiplicative regression model [
1] and the functional multiplicative regression model [
2] that correspond to the cases of
and
, respectively. When the log transformation applies to this model, the above model simply degrades to a partial functional linear regression model [
3]. When the response variable
Y is a failure time, model (
1) is called the functional accelerated failure time model in survival analysis; see [
4] for example. For simplicity of notation, we assume that
, and
and
have zero mean throughout the study.
In many applications, the response variable is positive, for example, survival time, stock prices, income, body fat level, emissions of nitrogen oxides, and the value of owner-occupied homes frequently arise in statistical practice. The multiplicative regression model plays an important role in describing these types of data. To estimate the multiplicative regression models, Refs. [
1,
5] proposed LARE and LPRE estimation, respectively. The LARE criterion minimizes
, and the LPRE criterion minimizes
, which is equivalent to minimizing
. As pointed out by [
5], the LARE estimation is robust and scale-free, but optimization of its use may be challenging as the objective function minimized is non-smoothing. In addition, confidence intervals for parameters are not very accurate due to the complexity of the asymptotic covariance matrix, which involves the density of the model error. In order to overcome the shortcoming of LARE, Ref. [
5] proposed the LPRE criterion, which is strictly convex and infinitely differentiable, and the optimization procedure is much easier. In recent years, due to the excellent properties of LARE and LPRE estimation, scholars in various fields have been attracted to conducting extended research on them. The readers can refer to [
6,
7,
8].
For functional multiplicative models, to the best of our knowledge, there are only a few works and all of them focus on the above two criteria. For example, Ref. [
9] developed the functional quadratic multiplicative model and derived the asymptotic properties of the estimator with the LARE criterion. Later, Refs. [
2,
10] considered the variable selection for partially and locally sparse functional linear multiplicative models based on the LARE criterion. In this paper, we consider the modeling of a positive scalar response variable with both scalar and functional predictors under the PFMRM. The above two criteria are employed to estimate the parametric vector
and the slope function
in model (
1).
The major contributions of this paper are four-fold. First, this study first extends the LPRE criterion to the estimation of functional regression models. Second, we estimate the unknown slope function and functional predictor by using a functional principal component analysis technique, derive the convergence rates of the slope function, and establish the asymptotic normality of the parameter vector under mild regularity conditions for two estimation methods. Third, we develop an iterative algorithm to solve the involved optimization problem and propose a data-driven procedure to select the tuning parameters. Finally, we conduct extensive numerical studies to examine the finite sample performance of the proposed methods and find that the LPRE method has better performance than the LARE, least square, and least absolute deviation methods.
The rest of the article is organized as follows.
Section 2 describes the detailed estimation procedures for model (
1).
Section 3 is dedicated to the asymptotic study of our estimators. The feasible algorithm for estimations of the parameters and nonparametric functions of PFMRM is proposed based on the LPRE criterion and presented in
Section 4.
Section 5 conducts simulation studies to evaluate the finite sample performance of the proposed methods. In
Section 6, we apply the proposed method to the Tecator data. The article concludes with a discussion in
Section 7. Proofs are provided in
Appendix A.
2. Estimation Method
Let
,
be the independent realizations of
generated from model (
1), that is,
where random errors
,
are independent and identically distributed (i.i.d.) and independent of
and
.
The covariance and empirical covariance functions of
can be defined as
According to Mercer’s theorem, the spectral expansions of
and
can be written as
where
and
are the ordered eigenvalue sequences of the linear operators with kernels
and
, respectively, and
and
are appropriate orthonormal eigenfunction sequences. With a slight abuse of notation, we use
to denote both the covariance operator and the covariance function of
. We assume that the covariance operator
defined by
is strictly positive. In addition,
can be regarded as an estimator of
.
On the basis of the Karhunen–Loève decomposition,
and
can be expanded to
where
, and
represents the coordinate of the
ith curve with respect to the
jth eigenbasis.
Analogously, we define
,
,
, and
. Then, their corresponding empirical counterparts can be defined as
Given the orthogonality of
and (
3), model (
2) can be rewritten as
where
,
,
, and the truncation parameter
as
.
2.1. LARE Estimation
This is based on the LARE criterion of [
1] and
is replaced with its estimator
. The LARE estimates of model (
1) can be obtained by minimizing the following loss functions:
where
with
,
. Moreover, we can obtain the LARE estimator
.
2.2. LPRE Estimation
This is based on the LPRE criterion of [
5] and
is replaced with its estimator
. The LPRE estimates of model (
1) can be obtained by minimizing the following loss functions:
where
. Moreover, we can obtain the LPRE estimator
.
3. Asymptotic Properties
In this section, we establish the asymptotic properties of the estimators. Formulating the results requires the following technical assumptions. Firstly, we present some notations. Suppose that and are the true values of and , respectively, and let be the true score coefficient vector. The notation denotes the norm for a function or the Euclidean norm for a vector. In what follows, c denotes a generic positive constant that may take various values. Moreover, implies that is bounded away from zero and infinity as .
- C1.
The random process
and the score
satisfy the following conditions:
- C2.
For the eigenvalues of the linear operator and the score coefficients, the following conditions hold:
There exist some constants c and such that ;
There exist some constants c and such that .
- C3.
The tuning parameter .
- C4.
For the random vector ,
- C5.
There exists some constant c such that .
- C6.
Let
with
, for each
k, then
are independent and identically distributed random variables. Assume that
where
is the
kth diagonal element of
with
, and
is a positive definite matrix.
- C7.
The error has a continuous density in a neighborhood of 1, and is independent of .
- C8.
, , and .
- C9.
, .
Remark 1. C1–C3 are standard assumptions used in classical functional linear regression (see, e.g., [11,12]). More specifically, C1 is needed for the consistency of . C2(a) is required to identify the slope function by preventing the spacing between the eigenvalues from being too small, while C2 (b) is used to make the slope function sufficiently smooth. C3 is required to obtain the convergence rate of the slope function . C4–C6 are used to handle the linear part of the vector-type covariate in the model, which are similar to [3,13]. C4 is a little stronger than those in classical linear models and is primarily used to ensure the asymptotic behavior of and . C5 makes the effect of truncation on the estimation of small enough. Notably, is the regression error of on , and the conditions on in C6 essentially restrict that can only be linearly related to . C6 is also used to establish the asymptotic normality of the parameter estimator, in a similar manner to that applied in [3,13] for modeling the dependence between parametric and nonparametric components. C7–C8 are standard assumptions on random errors of the LARE estimator used in [1]. C9 is the standard assumption for random errors in the LPRE estimator used in [5]. The following two theorems present the convergence rate of the slope function estimator
and establish the asymptotic normality of the parameter estimator
, respectively, with the LARE method introduced in
Section 2.1 above.
Theorem 1. If conditions C1–C8 hold, then Theorem 2. If conditions C1–C8 hold, as , we havewhere represents convergence in distribution, . The following two theorems give the rate of convergence of the slope function and the asymptotic normality of the parameter vector, respectively, with the LPRE method introduced in
Section 2.1 above.
Theorem 3. Suppose conditions C1–C6 and C9 hold; then, Theorem 4. Suppose conditions C1–C6 and C9 hold; as , we havewhere . Remark 2. The convergence rate of the slope function obtained in Theorems 1 and 3 is the same as that of [12,13], which is optimal in the minimax sense. The variance in Theorems 2 and 4 involves the random error density function, which is the standard feature of multiplicative regression models. One can consult Theorem 3.2 of [14] for more details. 4. Implementation
Considering that the minimization problems of the LARE method are the special cases of the LPRE procedure, we only provide a detailed implementation of the LPRE approach. Specially, we use the Newton–Raphson iterative algorithm to solve the LPRE problem in Equation (
5). Let
; then,
Then, the computation can be implemented as follows:
Step 1 Initialization step. In this paper, the least squares estimator is chosen as the initial estimator.
Step 2 Update the estimator
of
by using the following iterative procedure:
where
and
represent the gradient and Hessian matrix of
at
, respectively.
Step 3 Step 2 is repeated until convergence. We use the
norm of the difference between two consecutive estimates less than
as the convergence criterion. Note that [
8] proposed a profiled LPRE method in partial linear multiplicative models. The algorithm in [
8] requires that
and
hold simultaneously. Since the LPRE objective functions (
5) are infinitely differentiable and strict, the prposed Newton–Raphson method can relax the restriction
. Moreover, the minimizer of the objective function (
5) is just the root of its first derivative. We will express the final LPRE estimator of
as
. Furthermore, the LPRE estimator of the slope function is indicated by
.
5. Simulation Studies
In this section, the finite sample properties of the proposed estimation methods are investigated through Monte Carlo simulation studies. We compare the performance of the two proposed methods with the least absolute deviations (LAD) method in [
15] and the least squares (LS) method in [
3], where both the LS and LAD estimates are based on the logarithmic transformation on the two sides of the following model (
6). The sample size
n is set as 150, 300, and 600. And the datasets are generated from the following model:
where
follows the standard normal distribution,
follows the Bernoulli distribution with a probability of 0.5, and
. For the functional linear component, we use a similar setting to that used by [
13] to set
and
, where
, and
s are independently distributed according to the normal distribution with mean 0 and variance
for
. Similar to [
1], the random error
is considered from the following three distributions: (i)
, (ii)
, and (iii)
; the choice of
a satisfies
.
Implementing the proposed estimation method requires the tuning parameter
m. Here,
m is selected as the minimum value that reaches a certain proportion (denoted by
) of the cumulative percentage of total variance (CPV) by the first
m leading components as follows:
where
M is the largest number of functional principle components, such that
, and
is used in this study.
Based on 500 replications,
Table 1 summarizes the performance of different estimators in terms of bias (Bias) and standard deviation (Sd) of the estimated
and
, as well as the mean squared error (MSE) of the estimated
.
Table 2 provides the root average square errors (RASEs) of the estimated
for LARE estimation, where the RASE is defined as follows:
where
are equally spaced grids to calculate the value of function
, and we take
in this simulation. We compute the RASE for each replicate observation and obtain the average. In addition, the definitions of RASE for the LPRE, LAD, and LS methods are similar, we just replaced
with the corresponding estimators.
From
Table 1 and
Table 2, we have the following observations: (a) Sd, MSE, and RASE decrease and the estimation performance improves as sample size
n increases from 150 to 600. The estimates of the parametric covariate effects are basically unbiased and close to their true values, indicating that our proposed approaches produce consistent estimators. (b) When
follows the normal random error, as expected, both LS and LPRE perform the best, and LAD performs the worst. (c) When
follows
, LPRE performs the best, LARE also performs well, and LAD still performs the worst. (d)
follows
. Note that the random error violates condition C8
for the LARE method, which implies that the random error of zero mean in the least squares or of zero median in the LAD regression does not hold. Meanwhile, the LPRE method works well in the case of
. LPRE performs considerably better than LARE and LAD; this indicates that LPRE is much more robust than the LARE and LAD methods. In summary, LPRE performs the best in almost all the scenarios considered, confirming its superiority to LARE and other competing methods.
6. Application to Tecator Data
In this section, we introduce the application of the proposed estimation methods to Tecator data. The dataset is contained in the R package fda.usc in [
16], and includes 215 independent food samples with fat, protein, and water of meat measured in percent. It has been widely used in the analysis of functional data. The Tecator data consist of a 100-channel spectrum of absorbances working in the wavelength from 850 to 1050 nanometers (nm). Further details on the data can be found in [
2,
9]. The purpose is to tease out the relation among the quantity of fatty
Y (response), protein content
, and water content
(real random variables), and the spectrometric curve
(a functional predictor). To predict the fat content of a meat sample, we consider the following PFMRM:
To assess the predictive capability of the proposed methods, we followed [
13] to randomly divide the sample into two subsamples:
as the training sample, where
represents the base of
, and
as the testing sample. The training and testing samples were used to estimate parameters and check the accuracy of the prediction, respectively. We used the mean quadratic prediction error (MQEP) as a criterion to evaluate the performance of various estimation procedures. The MQEP is defined as follows:
where
is predicted based on the training sample, and
is a response variable from test sample variance.
In addition, we compare the performances of the proposed model with the partial functional regression model in Shin [
3], and log transformation on two sides of model (
7) (denoted as “LogPFLM”). Specifically,
The CPV criterion introduced in
Section 4 was used to determine the cutoff parameter
m. Here,
was selected to explain approximately 95% of the variance in the Tecator data.
Table 3 shows the average MQEP of
N times repeated operations. The first and second rows of
Table 3 show the prediction results of the logPFLM using LS and LAD methods, respectively. The third and fourth rows give the prediction results of (
7) under the LARE and LPRE methods, respectively. The final row presents the prediction results of the PFLM without logarithmic transformation by the LS method. Overall, the LPRE outperforms all other competing methods regardless of the number of random splits. LS performs the second best, whereas LAD performs the worst. In addition, we employed the above models and methods to Tecator data just considering the scalar predictors or functional predictors; and the results indicated relatively poor performance, so we have not reported them.
Then, we used the best-performing LPRE method to estimate the unknown parameters based on the entire dataset. The estimates of
and
are
and
, respectively. Both protein and water are positively associated with the logarithmic transformation of fatty.
Figure 1 depicts the estimated
. In general, the spectrometric curve has a positive effect on the logarithmic transformation of fatty, and the estimated curve
is small due to the large integral domain. The advantages of the LPRE method are particularly evident in the analysis of this dataset.
7. Conclusions
In this paper, we study the estimated problems of PFMRM based on the LARE and LPRE criteria, and the unknown slope function and functional predictor are approximated by the functional principal component analysis technique. Under some regularity conditions, we obtain the convergence rates of the slope function and the asymptotic normality of the parameter vector for two estimation methods. Both the numerical simulation results and the real data analysis show that the LPRE method is superior to the LARE, least square, and least absolute deviation methods. Several issues still warrant further study. First, we choose the Karhunen–Loève expansion to approximate the slope function in this article. Other nonparametric smoothing techniques, such as B-spline, kernel estimation, and penalty regression spline, can be used in our proposed LARE and LPRE estimation methods, and the large sample characteristics and limited sample comparison are worth studying. Furthermore, the proposed methods can also be extended to more general situations, including, but not limited to, dependent functional data, partially observed functional data, and multivariate functional data. Substantial efforts must be devoted to related advances in the future.