Next Article in Journal
Existence Results for Coupled Implicit \({\psi}\)-Riemann–Liouville Fractional Differential Equations with Nonlocal Conditions
Next Article in Special Issue
On Some New Ostrowski–Mercer-Type Inequalities for Differentiable Functions
Previous Article in Journal
Truncated Fractional-Order Total Variation for Image Denoising under Cauchy Noise
Previous Article in Special Issue
Pólya–Szegö Integral Inequalities Using the Caputo–Fabrizio Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

K-Nearest Neighbor Estimation of Functional Nonparametric Regression Model under NA Samples

1
College of Mathematics and Physics, Anqing Normal University, Anqing 246133, China
2
Department of Mathematics, Brunel University, London UB8 3PH, UK
*
Author to whom correspondence should be addressed.
Axioms 2022, 11(3), 102; https://doi.org/10.3390/axioms11030102
Submission received: 10 January 2022 / Revised: 16 February 2022 / Accepted: 21 February 2022 / Published: 25 February 2022
(This article belongs to the Special Issue Current Research on Mathematical Inequalities)

Abstract

:
Functional data, which provides information about curves, surfaces or anything else varying over a continuum, has become a commonly encountered type of data. The k-nearest neighbor (kNN) method, as a nonparametric method, has become one of the most popular supervised machine learning algorithms used to solve both classification and regression problems. This paper is devoted to the k-nearest neighbor (kNN) estimators of the nonparametric functional regression model when the observed variables take values from negatively associated (NA) sequences. The consistent and complete convergence rate for the proposed kNN estimator is first provided. Then, numerical assessments, including simulation study and real data analysis, are conducted to evaluate the performance of the proposed method and compare it with the standard nonparametric kernel approach.

1. Introduction

Functional data analysis (FDA) is a branch of statistics that analyzes data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function.
Popularized by Ramsay and Silverman [1,2], statistics for functional data analysis have attracted considerable research interest because of its wide applications in many practical fields, such as medicine, economics and linguistics. For an introduction to the topics, we can refer to the monographs of Ramsay and Silverman [3] for parametric models, and Ferraty and Vieu [4] for nonparametric models.
In this paper, the following functional non-parametric regression model is considered.
Y = m ( χ ) + ϵ , ( 1 )
where Y is a scalar response variable, χ is a covariate taking value in a subset S F of an infinite-dimensional functional space F endowed with a semi-metric d ( · , · ) . m ( · ) is the unknown regression operator from S F to R , and the random error ϵ satisfies E ( ϵ | χ ) = 0 , a . s .
For the estimation of model (1), Ferraty and Vieu [5] investigated the classical functional Nadaraya-Watson (N-W) kernel type estimator of m ( · ) and obtained the asymptotic properties with rates in the case of α -mixing functional data. Ling and Wu [6] studied the modified N-W kernel estimate and derived the asymptotic distribution for strong mixing functional time series data, Baíllo and Grane [7] proposed a functional local linear estimate based on the local linear idea. In this paper, we focus on the k-nearest neighbors (kNN) method for regression model (1). The kNN method, as one of the most simple and traditional nonparametric techniques, is often used as a nonparametric classification method. The kNN method was first developed by Evelyn Fix and Joseph Hodges in 1951 [8] and then expanded by Thomas Cover [9]. In our kNN regression, the input consists of the k-closest training examples in a dataset, whereas the output is the property value for the object. This value is the average of the values of the k-nearest neighbors. Under independent samples, research in kNN regression mostly focuses on the estimation of the continuous regression function m ( χ ) . For example, Burba et al. [10] investigated the kNN estimator based on the idea of the local adaptive bandwidth of functional explanatory variables. The papers [11,12,13,14,15,16,17,18], and others, obtained the asymptotic behavior of nonparametric regression estimators for functional data in independent and dependent cases. Further, Kudraszow and Vieu [19] obtained asymptotic results for a kNN generalized regression estimator when the observed variables take values in an abstract space. Kara-Zaitri et al. [20] provided an asymptotic theory for several different target operators and some simulated experiences, including regression, conditional density, conditional distribution and hazard operators. However, functional observations often behave with correlation, including satisfying some form of negative dependence or negative association.
Negatively associated (NA) sequences were introduced by Joag-Dev and Proschan in [21]. Random variables { Y i } 1 i n are said to be NA, if for every pair of disjoint subsets A , B { 1 , 2 , , n } ,
Cov ( f ( Y i , i A ) g ( Y j , j B ) ) 0 ,
or equivalently,
E ( f ( Y i , i A ) , g ( Y j , j B ) ) E ( f ( Y i , i A ) ) E ( g ( Y j , j B ) ) ,
where f and g are coordinatewise non-decreasing, such that this covariance exists. An infinite sequence { Y n } n 1 is NA if every finite subcollection is NA.
For example, if { Y i } 1 i n follows permutation distributions, where { Y 1 , Y 2 , , Y n } = { y 1 , y 2 , , y n } always and y 1 y 2 y n are n real numbers, then { Y i } 1 i n is NA.
Whereas kNN regression under NA sequences has not been explored in the literature, in this paper, we extend the kNN estimation of functional data from the case of independent samples to NA sequences.
Let a pair { ( χ i , Y i ) } i = 1 , , n be a sample of NA pairs in ( χ , Y ) , which is a random vector valued in the F × R . ( F , d ) is a semi-metric space, F is not necessarily of the finite dimension and we do not suppose the existence of a density for the functional random variable χ . For a fixed χ F , the closed ball with χ as the center and ϵ as the radius is denoted as:
d : B ( χ , ϵ ) = { χ F | d ( χ , χ ) ϵ } .
The kNN regression estimator [10] is defined as follows:
m ^ k N N ( χ ) = i = 1 n Y i K ( H n , k ( χ ) 1 d ( χ i , χ ) ) i = 1 n K ( H n , k ( χ ) 1 d ( χ i , χ ) ) , χ F ,
where K ( · ) is the kernel function supported on [ 0 , ) . H n , k ( χ ) is a positive random variable that depends on ( χ 1 , χ 2 , , χ n ) and is defined by:
H n , k ( χ ) = min { h R + : i = 1 n I B ( χ , h ) χ i = k } ,
obviously, the kNN estimator can be seen as an expansion to a random locally adaptive neighborhood of the traditional kernel method [5] defined as:
m ^ n ( χ ) = i = 1 n Y i K ( h n ( χ ) 1 d ( χ i , χ ) ) i = 1 n K ( h n ( χ ) 1 d ( χ i , χ ) ) , χ F ,
where h n ( χ ) is a sequence of positive real numbers such as h n ( χ ) 0 a.s. n .
This paper is organized as follows. The main results of our paper about the asymptotic behavior of the kNN estimators using a data-driven random number of neighbors are given in Section 2. Section 3 illustrates the numerical performance of the proposed method, including nonparametric functional regression analysis of the sea level surface temperature (SST) data for the El Ni n ˜ o area (0–100 S, 800–900 W). The technical proofs are postponed to Section 4. Finally, Section 5 is devoted to comments on the results and to related perspectives for the future.

2. Assumptions and Main Results

In this section, we focus on the asymptotic property of the kNN regression estimator and need to state the convergence rate of an estimator.
One says that the rate of almost complete convergence of a sequence { Y n , n 1 } to Y is of order u n if only if for any ϵ > 0 ,
n = 1 P ( | Y n Y | > ϵ u n ) < ,
and we write Y n Y = O a . c o . ( u n ) (see for instance [5]). By the Borel-Cantelli lemma, this implies that Y n Y u n 0 almost surely, so almost complete convergence is a stronger result than almost sure convergence.
Our results are stated under some mild assumptions we gather below for easy references. Throughout the paper, we will denote by C , C 1 , C some positive generic constants, which may be different in various places.
Assumption 1.
ϵ > 0 , P ( χ B ( χ , ϵ ) ) = φ χ ( ϵ ) > 0 and φ χ ( · ) is a continuous function, and strictly monotonically increasing at the origin with φ χ ( 0 ) = 0 .
Assumption 2.
There exist a function ϕ ( · ) 0 and a bounded function f ( · ) > 0 such that:
(i) 
F ϕ ( 0 ) = 0 , and lim ϵ ϕ ( ϵ ) = 0 .
(ii) 
lim ϵ ϕ ( u ϵ ) ϕ ( ϵ ) = 0 , for any u [ 0 , 1 ] .
(iii) 
τ > 0 such that sup χ S F φ χ ( ϵ ) ϕ ( ϵ ) f ( χ ) = O ( ϵ τ ) , ϵ 0 .
Assumption 3.
K ( t ) is a nonnegative bounded kernel function with support [0, 1], and if K ( 1 ) > 0 , the derivative K ( t ) exists on [0, 1] satisfying:
< C < K ( t ) < C < , f o r t [ 0 , 1 ] .
Assumption 4.
m ( · ) is a bounded Lipschitz operator with order β on S F , and there exists β > 0 such that:
χ 1 , χ 2 S F , m ( χ 1 ) m ( χ 2 ) C d ( χ 1 , χ 2 ) β .
Assumption 5.
m 2 , E | Y | m X = χ = δ m ( χ ) < C with δ m ( · ) continuous on S F .
Assumption 6.
Kolmogorov’s ϵ-entropy of S F satisfies:
n = 1 exp ( 1 ω ) Ψ S F log n n < , f o r s o m e ω > 1 .
For ϵ > 0 , the Kolmogorov’s ϵ-entropy of some set S F F is defined by Ψ S F = log ( N ϵ ( S F ) ) , where N ϵ ( S F ) is the minimal number of open balls, which can cover S F with χ 1 , χ 2 , , χ N ϵ ( S F ) as the center and ϵ as the radius in F .
Remark 1.
Assumption 1, Assumption 2((i)–(iii)) and Assumption 4 are the standard assumptions for small ball probability and regression operators in nonparametric FDA, see Kudraszow and Vieu [19]. Assumption 2(ii) will play a key role in the methodology particularly when we compute the asymptotic variance and permit it to be explicit in Ling and Wang [6]. Assumption 2(iii) shows that the small ball probability can be written as the product of the two independent functions ϕ ( · ) and f ( · ) , which has been used many times in Masry [11], Laib and Louani [12] and other literatures. Assumption 5 is standard in the nonparametric setting and concerns the existence of the conditional moments in Masry [11] and Burba [10], which aims to obtain the rate of uniform almost complete convergence. Assumption 6 assumes the Kolmogorov’s ϵ-entropy condition, which we will use in the following proof of the rate of uniform almost complete convergence.
Theorem 1.
Under Assumptions 1–6, suppose that sequence { k n , n 1 } satisfies k n n 0 , n , log 2 n k n < Ψ S F log n n < k n log n and 0 < C 1 < k n log 2 n < C 2 < , for n large enough, then we have:
sup χ S F m ^ k N N ( χ ) m ( χ ) = O a . c o . ϕ 1 k n n β + s n 2 Ψ S F log n n n 2 .
Remark 2.
The Theorem extends the kNN estimation result of Theorem 2 in Kudraszow and Vieu [19] from the independent case to the NA mixed dependent case, and obtains the same convergence rate under the assumptions. Second, the almost complete convergence rate of the prediction operator is divided into two parts, one part affected by strong mixing and Kolmogorov’s ϵ -entropy, and the other part depends on the smoothness of the regression operator and smoothness parameter k.
Corollary 1.
Under the condition of the Theorem, we have:
sup χ S F m ^ k N N ( χ ) m ( χ ) = O a . s . ϕ 1 k n n β + s n 2 Ψ S F log n n n 2 .
Corollary 2.
Under the condition of the Theorem, we have:
sup χ S F m ^ k N N ( χ ) m ( χ ) = O P ϕ 1 k n n β + s n 2 Ψ S F log n n n 2 .

3. Simulation

3.1. A simulation Study

In this section, we aim at illustrating the performance of the nonparametric functional regression model and we will make a comparison with traditional kernel density estimation methods. We consider the nonparametric functional regression model:
Y i = m ( χ i ) + ε i ,
where m ( χ i ) = 0 π 5 χ i ( t ) d t 2 , ε i is distributed according to N ( 0 , 0.05 ) , the functional curve χ i ( t ) is generated in the following way:
χ i ( t ) = a i t 3 + a r c t a n b i t π 5 , t 0 , π 5 , i = 1 , 2 , , n .
where a i N 0 , π 10 , i = 1 , 2 , , n , b 1 , b 2 , , b n N n ( 0 , Σ ) , 0 represents zero vector and the covariance matrix is defined as:
= 1 + θ 2 θ 0 0 0 0 θ 1 + θ 2 θ 0 0 0 0 θ 1 + θ 2 0 0 0 0 0 0 1 + θ 2 θ 0 0 0 0 θ 1 + θ 2 θ 0 0 0 0 θ 1 + θ 2 n × n , 0 < θ < 1 .
By the definition of NA, it can be seen that ( b 1 , b 2 , , b n ) is an NA vector for each n 3 with a finite moment of any order (see Wu and Wang [22]).
We choose casually that θ = 0.4 , the sample sizes n as n = 330 , t takes 1000 equispaced values in [ 0 , π 5 ] . We carry out the simulation of the curve χ ( t ) for the 330 samples (see Figure 1).
We consider the Epanechnikov kernel given by K ( u ) = 3 4 ( 1 u 2 ) I [ 0 , 1 ] ( u ) , and the semi-metrics d ( · , · ) based on derivatives of order q.
d ( χ i , χ j ) = 0 π 5 χ i ( q ) ( t ) χ j ( q ) ( t ) 2 d t , χ i , χ j F , q = { 0 , 1 , 2 , } .
Our purpose is to compare the mean square error (MSE) of the kNN method with the NW kernel approach on finite simulated datasets. In the finite sample simulation, the following steps are followed.
Step 1: We take 300 curves to construct the training samples χ i , Y i i = 1 300 , and the other 30 constitute the test samples { χ i , Y i } i = 301 330 .
Step 2: In the training sample, the parameters k and h in the kNN method and NW kernel method are automatically selected based on the cross-validation method, respectively.
Step 3: Based on the MSE standard (see [4] for details), we obtain that the respective semi-metric parameters q in both the kNN method and the NW method takes q = 1 .
Step 4: The response values Y ^ i i = 301 330 and Y ˜ i i = 301 330 of the test sample Y i i = 301 330 are calculated by using the kNN method and the NW method, respectively, and their MSE and scatter plots against the true value { Y i } i = 301 330 are represented by Figure 2.
As we can see in Figure 2, the MSE of the kNN method is much smaller than that of the NW method, and the scattered points in Figure 2 are more densely distributed around the linear function y = x , which shows that the kNN method has a better fit and higher prediction accuracy for the NA dependent functional samples.
The kNN method and NW method were used to conduct 100 independent replicated experiments at sample sizes of n = 200 , 300 , 500 , 800 , respectively. AMSE was calculated for both methods at different sample sizes using the following equation.
AMSE = 1 100 j = 1 100 1 30 i = n 30 n Y ¯ i Y i 2 , Y ¯ i = Y ^ i , Y ˜ i , n = 200 , 300 , 500 , 800
As can be seen from Table 1, the AMSE of the kNN method is much smaller than that of the NW kernel method when the sample size is fixed at n = 200 , 300 , 500 , 800 , respectively; when the estimation method is fixed, the AMSE of the two estimation methods have the same trend—they both decrease as the sample size increases. However, the decreasing speed of the kNN method is significantly faster than that of the NW kernel method.

3.2. A Real Study

This section applies the proposed kNN regression analysis of the data, which consist of the sea level surface temperature (SST) for the El N i n ˜ o area (0–100 S, 800–900 W) for a total of 31 years from 1 January 1990 to 31 December 2020. The data are available online at the website: https://www.cpc.ncep.noaa.gov/data/indices/ (accessed on 1 January 2022). More relevant discussions of these data can be found in Ezzahrioui et al. [13,14], Delsol et al. [23], and Ferraty et al. [24] The 1618 weekly SST data from the original data were preprocessed and averaged by month to obtain 372 monthly average SST discrete data. Figure 3 displays the decomposition of the multiplicative time series of the monthly SST.
Figure 4 shows that the monthly average SST in El Ni n ˜ o regions from 1990 to 2020 had a clear seasonal variation, and the monthly trend of SST can also clearly be observed from the seasonal index plot of the monthly mean SST.
The main factors affecting the temperature variation can be generally summarized as seasonal factors and random fluctuations. If the seasonal factor is removed, the SST should be left with only random fluctuations, i.e., the values fluctuate up and down at some mean value. At the same time, if the effect of random fluctuations is not considered, the SST is left with only the seasonal factor, i.e., the SST will have similar values in the same month in different years.
The following steps implement the kNN regression estimation method for the analysis of the SST data and display the comparison with the NW sum estimation method in Figure 5.
Step 1: Transform 372 months (31 years) of SST data { Z i , i = 1 , , 372 } into functional data.
Step 2: Divide the 31 samples of data ( χ j , Y j ( s ) ) j = 1 , , 31 into two parts: 30 training samples of data ( χ j , Y j ( s ) ) j = 1 , , 30 for model fitting and 1 test sample of data ( χ 31 , Y 31 ( s ) ) for prediction assessment.
Step 3: Here, the functional principal component analysis (FPCA) is applicable to semi-measures for rough curves such as SST data (see Chapter 3 of Ferraty et al. [25] for the methodology). A quadratic kernel function used in Section 3.1 is used in kNN regression.
Step 4: The SST values ( Y ^ 31 ( s ) , s = 1 , , 12 ) for 12 months in 2020 are predicted by the kNN method and the NW method, respectively, along with obtaining their MSEs for both methods.
Then, in step 1, we split the discrete monthly average temperature data of 372 months into 31 years of temperature profiles and express them as χ i = { Z i ( t ) , 12 ( j 1 ) < t < 12 j } , i = 1 , , 31 . Therefore, the response variable can be expressed as Y j ( s ) = { Z 12 j + s , s = 1 , , 12 } , j = 1 , , 30 . Thus, ( χ j , Y j ( s ) ) j = 1 , , 30 is the sample set of dependent function type with a sample size of 30, where χ j is the function type data, and Y j ( s ) is a real value.
In Step 3, the choice of parameters q for the kNN method and NW method is performed via computation of cross-validation in R, which gives q = 3 and q = 1 for the kNN regression method and NW method, respectively. The selection of parameters k and h is similar to Section 3.1.
From Figure 5, which compares the MSE values calculated by the two methods, it can be seen that the MSE of the kNN method is much smaller than that of the NW method. Further, noting that the degree of fit between the curves fitted by the two methods to the true curve (dotted line), the predicted curves by two methods are generally closer to the true curve, indicating that the prediction effect of both methods is very good. However, a closer look reveals that the predicted values of the kNN method obviously have better fitting at the inflection points of the curves, such as January, February, March, November and December, which fully reflect the fact that the kNN method pays more attention to the local variation than the NW method when processing the data like this, including the abnormal or extreme distribution of the response variable.

4. Proof of Theorem

In order to prove the main results, we give some lemmas. Let ( A i , B i ) i = 1 , 2 , , n be n random pairs valued in ( Ω × R , A × B ( R ) ) , where ( Ω , A ) is a general measurable space. Let S Ω be a fixed subset of Ω , G ( · , ( χ , · ) ) : R × ( S Ω × Ω ) R + be a measurable function, for t , t R ,
( L 0 ) : t t G ( t , z ) G ( t , z ) , z S Ω × Ω ,
D n ( χ ) n N is a sequence of random real variables (r.r.v.), and c ( · ) : S Ω R is a nonrandom function such that sup χ S Ω c ( χ ) < . For χ S Ω , n N / { 0 } , we define:
c n , χ ( t ) = i = 1 n B i G ( t , ( χ , A i ) ) i = 1 n G ( t , ( χ , A i ) ) .
Lemma 1
([10]). Let u n ( χ ) n N be a decreasing positive real sequence satisfying lim n u n = 0 . For any increasing sequences β n ( 0 , 1 ) and β n 1 = O ( u n ) , there exist two real random sequences { D n ( β n , χ ) } n N and { D n + ( β n , χ ) } n N such that:
(L1) 
D n ( β n , χ ) D n + ( β n , χ ) , n N , χ S Ω ,
(L2) 
I D n ( β n , χ ) D n ( β n , χ ) D n + ( β n , χ ) , χ S Ω 1 , a.co. n ,
(L3) 
sup χ S Ω i = 1 n G D n ( β n , χ ) i = 1 n G D n + ( β n , χ ) β n = O a . c o . ( u n ) ,
(L4) 
sup χ S Ω c n , χ D n ( β n , χ ) c ( χ ) = O a . c o . ( u n ) ,
(L5) 
sup χ S Ω c n , χ D n + ( β n , χ ) c ( χ ) = O a . c o . ( u n ) ,
then, we have:
sup χ S Ω c n , χ ( D n ( β n , χ ) ) c ( χ ) = O a . c o . ( u n ) .
The proof of Lemma 1 is not presented here because it follows, step by step, the same argument in Burba et al. [10], Kudraszow and Vieu [19].
Lemma 2
([26]). Let { X n , n N } be an NA random sequence with zero mean, and there exists a positive constant c k , k = 1 , 2 , , n such that | X k | c k , let S n = X 1 + X 2 + + X n . For any ϵ > 0 , we get:
P ( S n n ϵ ) exp n 2 ϵ 2 2 i = 1 n c i 2 ,
and
P ( | S n | n ϵ ) 2 exp n 2 ϵ 2 2 i = 1 n c i 2 .
Lemma 3.
Suppose that Assumptions 1–6 hold, and h n ( χ ) 0 a.s. n in model (3) satisfying:
lim φ χ ( H n , k ( χ ) ) φ χ ( h n ( χ ) ) = 0 ,
0 < C 1 h n inf χ S F h n ( χ ) sup χ S F h n ( χ ) C 2 h n < ,
and for n large enough,
log 2 n n ϕ ( h n ) < Ψ S F log n n < n ϕ ( h n ) log n ,
0 < C 1 < n ϕ ( h n ) log 2 n < C 2 < ,
then we have:
sup χ S F m ^ n ( χ ) m ( χ ) = O a . c o . h n β + O a . c o . s n 2 Ψ S F ( ϵ ) n 2 ,
where ϵ = log n n .
Proof of Lemma 3. 
In order to simplify the proof, we introduce some notations in this article. For χ S F , let k ( χ ) = arg min k = 1 , 2 , , N ϵ ( S F ) d ( χ , χ k ) , s n 2 = max s n , 1 2 , s n , 2 2 , s n , 3 2 , s n , 4 2 be the mixed operator covariance,
s n , 1 2 = i = 1 n j = 1 n Cov ( Y i u i , Y j u j ) , s n , 2 2 = i = 1 n j = 1 n Cov ( v i , v j ) ,
s n , 3 2 = i = 1 n j = 1 n Cov ( u i , u j ) , s n , 4 2 = i = 1 n j = 1 n Cov ( w i , w j ) ,
where u i = I B χ k χ , C 2 h n ( χ ) + ϵ , 0 < h n 0 ,
v i = Y i K h n ( χ k ) 1 d ( χ k , χ i ) E K h n ( χ k ) 1 d ( χ k , χ 1 ) E Y i K ( h n ( χ k ) 1 d ( χ k , χ i ) E K h n ( χ k ) 1 d ( χ k , χ 1 ) ,
w i = K h n ( χ k ) 1 d ( χ k , χ i ) E K h n ( χ k ) 1 d ( χ k , χ 1 ) E K h n ( χ k ) 1 d ( χ k , χ i ) E K h n ( χ k ) 1 d ( χ k , χ 1 ) .
For the fixed χ S F in model (3), we have the decomposition as follows:
m ^ n ( χ ) m ( χ ) = m ^ 2 n ( χ ) m ^ 1 n ( χ ) m n ( χ ) = 1 m ^ 1 n ( χ ) [ m ^ 2 n ( χ ) E m ^ n ( χ ) ] + 1 m ^ 1 n ( χ ) [ E m ^ n ( χ ) m n ( χ ) ] + m n ( χ ) m ^ 1 n ( χ ) [ 1 m ^ 1 n ( χ ) ] .
where:
m ^ 1 n ( χ ) = i = 1 n K h n ( χ ) 1 d ( χ , χ i ) n E K h n ( χ ) 1 d ( χ , χ 1 ) , m ^ 2 n ( χ ) = i = 1 n Y i K h n ( χ ) 1 d ( χ , χ i ) n E K h n ( χ ) 1 d ( χ , χ 1 ) .
It suffices to prove the three following results in order to establish (9),
sup χ S F E m ^ 2 n ( χ ) m ( χ ) = O a . c o . h n β ,
sup χ S F m ^ 2 n ( χ ) E m ^ 2 n ( χ ) = O a . c o . s n 2 Ψ S F ( ϵ ) n 2 ,
sup χ S F m ^ 1 n ( χ ) 1 = sup χ S F m ^ 1 n ( χ ) E m ^ 1 n ( χ ) = O a . c o . s n 2 Ψ S F ( ϵ ) n 2 .
As to the Equation (10). For χ S F , by the Equation (6) and Assumption 4, it follows that:
E m ^ 2 n ( χ ) m ( χ ) = E i = 1 n Y i K h n ( χ ) 1 d ( χ , χ i ) n E K h n ( χ ) 1 d ( χ , χ 1 ) m ( χ ) = E Y 1 K h n ( χ ) 1 d ( χ , χ i ) E K h n ( χ ) 1 d ( χ , χ 1 ) m ( χ ) = m ( χ 1 ) E K h n ( χ ) 1 d ( χ , χ i ) m ( χ ) E K h n ( χ ) 1 d ( χ , χ 1 ) E K h n ( χ ) 1 d ( χ , χ 1 ) = m ( χ 1 ) m ( χ ) = O a . c o . h n β .
Then, we need to show the Equation (11). In fact, we have the decomposition as follows:
sup χ S F m ^ 2 n ( χ ) E m ^ 2 n ( χ ) sup χ S F m ^ 2 n ( χ ) m ^ 2 n ( χ k ( χ ) ) + sup χ S F m ^ 2 n ( χ k ( χ ) ) E m ^ 2 n ( χ k ( χ ) ) + sup χ S F E m ^ 2 n ( χ k ( χ ) ) E m ^ 2 n ( χ ) = : I 1 + I 2 + I 3 .
For I 1 , by Assumption 3, it is easily seen that:
0 < C 1 < E K h n ( χ ) 1 d ( χ , χ i ) < C 2 < ,
thus,
I 1 = sup χ S F i = 1 n Y i K h n ( χ ) 1 d ( χ , χ i ) n E K h n ( χ ) 1 d ( χ , χ 1 ) i = 1 n Y i K h n ( χ k ( χ ) ) 1 d ( χ k ( χ ) , χ i ) n E K h n ( χ k ( χ ) ) 1 d ( χ k ( χ ) , χ 1 ) = sup χ S F 1 n i = 1 n Y i K h n ( χ ) 1 d ( χ , χ i ) E K h n ( χ ) 1 d ( χ , χ 1 ) K h n ( χ k ( χ ) ) 1 d ( χ k ( χ ) , χ i ) E K h n ( χ k ( χ ) ) 1 d ( χ k ( χ ) , χ 1 ) C sup χ S F 1 n i = 1 n Y i K d ( χ , χ i ) h n ( χ ) K d ( χ k ( χ ) , χ i ) h n ( χ k ( χ ) ) I B χ , h n ( χ ) B χ k ( χ ) , h n ( χ ) ( χ i ) C sup χ S F 1 n i = 1 n Y i I B χ k ( χ ) , C 2 h n ( χ ) + ϵ ( χ i ) ,
for η > 0 , we have:
P I 1 > η s n , 1 2 Ψ S F ( ϵ ) n 2 P C sup χ S F 1 n i = 1 n Y i I B χ k ( χ ) , C 2 h n ( χ ) + ϵ ( χ i ) > η s n , 1 2 Ψ S F ( ϵ ) n 2 C N ϵ ( S F ) max k χ 1 , χ 2 , , χ N ϵ ( S F ) P i = 1 n | Y i | I B χ k ( χ ) , C 2 h n ( χ ) + ϵ ( χ i ) > η s n , 1 2 Ψ S F ( ϵ ) .
According to (4) in Lemma 2 and Assumption 6, we have:
P i = 1 n | Y i | I B χ k ( χ ) , C 2 h n ( χ ) + ϵ ( χ i ) > η s n , 1 2 Ψ S F ( ϵ ) exp η s n , 1 2 Ψ S F ( ϵ ) 2 i = 1 n c i 2 exp 1 η s n , 1 2 2 i = 1 n c i 2 Ψ S F ( ϵ ) < .
Hence, it follows that:
I 1 = O a . c o . s n , 1 2 Ψ S F ( ϵ ) n 2 .
For I 2 , similar to the proof of I 1 , for η > 0 , we have:
P I 2 > η s n , 2 2 Ψ S F ( ϵ ) n 2 = P sup χ S F i = 1 n Y i K d χ k ( χ ) , χ i h n χ k ( χ ) n E K d χ k ( χ ) , χ 1 h n χ k ( χ ) E i = 1 n Y i K d χ k ( χ ) , χ i h n χ k ( χ ) n E K d χ k ( χ ) , χ 1 h n χ k ( χ ) > η s n , 2 2 Ψ S F ( ϵ ) n 2 = P sup χ S F 1 n i = 1 n Y i K d χ k ( χ ) , χ i h n χ k ( χ ) E K d χ k ( χ ) , χ 1 h n χ k ( χ ) E Y i K d χ k ( χ ) , χ i h n χ k ( χ ) E K d χ k ( χ ) , χ 1 h n χ k ( χ ) > η s n , 2 2 Ψ S F ( ϵ ) n 2 N ϵ ( S F ) max k χ 1 , χ 2 , , χ N ϵ ( S F ) P 1 n i = 1 n Y i K d χ k ( χ ) , χ i h n χ k ( χ ) E K d χ k ( χ ) , χ 1 h n χ k ( χ ) E Y i K d χ k ( χ ) , χ i h n χ k ( χ ) E K d χ k ( χ ) , χ 1 h n χ k ( χ ) > η s n , 2 2 Ψ S F ( ϵ ) n 2 C N ϵ ( S F ) max k χ 1 , χ 2 , , χ N ϵ ( S F ) P 1 n i = 1 n Y i I B χ k ( χ ) , C 2 h n ( χ ) + ϵ ( χ i ) > η s n , 2 2 Ψ S F ( ϵ ) n 2 .
Thus,
I 2 = O a . c o . s n , 2 2 Ψ S F ( ϵ ) n 2 .
Finally, for I 3 , we can get I 3 E sup χ S F m ^ 2 n χ k ( χ ) m ^ 2 n ( χ ) . The proof process is similar to I 1 , and we can obtain:
I 3 = O a . c o . s n , 1 2 Ψ S F ( ϵ ) n 2 .
Therefore, combining the Equations (13)–(15), the Equation (11) can be established.
Similarly, we may prove the Equation (12). Hence, the proof of Lemma 3 is completed. □
Proof of Theorem 1. 
According to Lemma 1, let S Ω = S F , A i = χ i , B i = Y i , G ( t , ( χ , A i ) ) = K ( t 1 d ( χ , χ i ) ) , D n ( χ ) = H n , k ( χ ) , c n , χ ( χ ) = m ^ k N N ( χ ) , c ( χ ) = m ^ n ( χ ) . Let β n ( 0 , 1 ) be an increasing sequence such that β n 1 = O ( u n ) , where u n = ϕ 1 k n n β + s n 2 Ψ S F log n n n 2 is a decreasing positive real sequence such that lim n u n = 0 and h n = ϕ 1 k n n β . Let D n ( β n , χ ) n N and D n + ( β n , χ ) n N be two real random sequences such that:
φ χ D n ( β n , χ ) = φ χ ( h n ( χ ) ) β n 1 2 ,
φ χ D n + ( β n , χ ) = φ χ ( h n ( χ ) ) β n 1 2 ,
Firstly, we verify the conditions ( L 4 ) and ( L 5 ) in Lemma 1. By φ χ D n ( β n , χ ) and β n 1 = O ( u n ) , it is easy to follow that the local bandwidth D n ( β n , χ ) satisfies the condition (5). Combining h n = ϕ 1 k n n β with Assumption 2, it follows that h n ( χ ) satisfies the condition (6). Let k n = n ϕ ( h n ) , from Assumption 2(i) we obtain that k n n = n ϕ ( h n ) n = n ϕ ( h n ) is satisfied. Hence, according to the conditions of the Theorem, the Equations (7) and (8) in Lemma 3 hold. Thus, by Lemma 3, we have:
sup χ S F c n , χ D n ( β n , χ ) c ( χ ) = O a . c o . ϕ 1 k n n β + s n 2 Ψ S F log n n n 2 = O a . c o . ( u n ) .
Similarly, for D n + ( β n , χ ) , we can also get:
sup χ S F c n , χ D n + ( β n , χ ) c ( χ ) = O a . c o . ϕ 1 k n n β + s n 2 Ψ S F log n n n 2 = O a . c o . ( u n ) .
Secondly, checking the conditions ( L 1 ) and ( L 2 ) in Lemma 1, and combining (16) and (17) with β n ( 0 , 1 ) , it is clearly followed that:
φ χ D n ( β n , χ ) φ χ ( h n ( χ ) ) φ χ D n + ( β n , χ ) ,
By Assumption 1 we get:
D n ( β n , χ ) h n ( χ ) D n + ( β n , χ ) .
According to (5) and (18), for n , we have:
φ χ D n ( β n , χ ) φ χ H n , χ ( χ ) φ χ D n + ( β n , χ ) ,
That is:
φ χ D n ( β n , χ ) φ χ D n ( χ ) φ χ D n + ( β n , χ ) ,
Therefore, by Assumption 1 we can get:
D n ( β n , χ ) D n ( χ ) D n + ( β n , χ ) ,
Thus,
I D n ( β n , χ ) D n ( β n , χ ) D n + ( β n , χ ) , χ S F 1 , a . c o . n .
( L 2 ) is checked.
Finally, we establish the condition ( L 3 ) in Lemma 1. Similar to Kudraszow and Vieu [19], we denote:
f * ( χ , h n ( χ ) ) = E K h n ( χ ) 1 d ( χ , χ 1 ) , χ S F .
and let:
F 1 = f * χ , D n ( β n , χ ) f * χ , D n + ( β n , χ ) , F 2 = m ^ 1 n χ , D n ( β n , χ ) m ^ 1 n χ , D n + ( β n , χ ) 1 , F 3 = f * χ , D n + ( β n , χ ) f * χ , D n ( β n , χ ) β n 1 .
Then, ( L 3 ) can be decomposed as follows:
sup χ S F i = 1 n G D n ( β n , χ ) i = 1 n G D n + ( β n , χ ) β n = m ^ 1 n χ , D n ( β n , χ ) m ^ 1 n χ , D n + ( β n , χ ) f * χ , D n ( β n , χ ) f * χ , D n + ( β n , χ ) f * χ , D n + ( β n , χ ) f * χ , D n ( β n , χ ) f * χ , D n ( β n , χ ) f * χ , D n + ( β n , χ ) β n = f * χ , D n ( β n , χ ) f * χ , D n + ( β n , χ ) m ^ 1 n χ , D n ( β n , χ ) m ^ 1 n χ , D n + ( β n , χ ) f * χ , D n + ( β n , χ ) f * χ , D n ( β n , χ ) β n f * χ , D n ( β n , χ ) f * χ , D n + ( β n , χ ) m ^ 1 n χ , D n ( β n , χ ) m ^ 1 n χ , D n + ( β n , χ ) 1 + f * χ , D n + ( β n , χ ) f * χ , D n ( β n , χ ) β n 1 | F 1 | | F 2 | + | F 1 | | F 3 | .
By Assumption 3, it is followed that:
sup χ S F | F 1 | C ,
and for χ S F , m ^ 1 n ( χ ) = i = 1 n K h n ( χ ) 1 d ( χ , χ i ) n E K h n ( χ ) 1 d ( χ , χ 1 ) , refering to Ferraty et al. [25], we have:
sup χ S F m ^ 1 n ( χ ) 1 = O a . c o . s n 2 Ψ S F log n n n 2 ,
Therefore,
sup χ S F | F 2 | = sup χ S F m ^ 1 n χ , D n ( β n , χ ) m ^ 1 n χ , D n + ( β n , χ ) 1 = sup χ S F m ^ 1 n χ , D n ( β n , χ ) 1 + 1 m ^ 1 n χ , D n + ( β n , χ ) m ^ 1 n χ , D n + ( β n , χ ) sup χ S F m ^ 1 n χ , D n ( β n , χ ) 1 + sup χ S F m ^ 1 n χ , D n + ( β n , χ ) 1 inf χ S F m ^ 1 n χ , D n + ( β n , χ ) = O a . c o . s n 2 Ψ S F log n n n 2 .
Moreover, for F 3 , according to Lemma 1 in Ezzahrioui and Ould-said [13] and Assumption 2(iii), there exists τ > 0 , for χ S F ,
f * χ , h n ( χ ) = ϕ ( h n ( χ ) ) τ f ( χ ) + O ϕ ( h n ( χ ) ) h n ( χ ) β = τ φ ( h n ( χ ) ) + O ϕ ( h n ) h n β ,
by φ D n ( β n , χ ) φ D n + ( β n , χ ) = β n , sup χ S F | F 3 | = O ϕ ( h n ) h n β = O β n ϕ 1 ( k n n ) β holds. Hence, for β n 1 , it follows that
sup χ S F | F 3 | = O ϕ 1 ( k n n ) β .
Combining (19)–(22), we obtain:
sup χ S F i = 1 n G D n ( β n , χ ) i = 1 n G D n + ( β n , χ ) β n = O a . c o . ( u n ) .
( L 3 ) is established.
Thus, the conditions ( L 1 ) ( L 5 ) in Lemma 1 have been established. By Lemma 1, we can get:
sup χ S F m ^ K N N ( χ ) m ( χ ) = O a . c o . ϕ 1 k n n β + s n 2 Ψ S F log n n n 2 .
The proof of the Theorem 1 is completed. □

5. Conclusions and Future Research

Functional data analysis deals with the analysis and theory of data that are in the form of functions, images and shapes, or more general objects. In a way, correlation is really the heart of data science. The correlation between variables may be complicated, from simply independent to α -mixing or something else, such as negatively associated (NA). The kNN method, as one of the nonparametric methods, is very useful in statistical estimation and machine learning. While regression analysis of functional data under many variable correlated cases, except NA sequences, has been explored. This paper builds a kNN regression estimator of the functional regression model. In particular, we obtain the almost complete convergence rate of kNN estimation. Some simulated experiments and real data analyses illustrate the feasibility and the finite-sample behavior of the method. Further work includes introducing the kNN machine learning algorithm for functional data analysis and kNN high-dimensional modeling with NA sequences.

Author Contributions

Conceptualization, X.H. and J.W.; methodology, X.H.; software, J.W.; writing—original draft preparation, X.H. and J.W.; writing—review and editing, K.Y. and L.W.; visualization, K.Y.; supervision, X.H.; project administration, K.Y.; funding acquisition, K.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Social Science Foundation (Grant No. 21BTJ040).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

https://www.cpc.ncep.noaa.gov/data/indices/ (accessed on 9 January 2022).

Acknowledgments

The authors are most grateful to the Editor and anonymous referee for carefully reading the manuscript and for valuable suggestions which helped in improving an earlier version of this paper. This research was funded by the National Social Science Foundation (Grant No. 21BTJ040).

Conflicts of Interest

The authors declare no conflict of interest in this paper.

Abbreviations

The following abbreviations are used in this manuscript:
NANegatively Associated
kNNk-Nearest Neighbor

References

  1. Ramsay, J.; Dalzell, C. Some Tools for Functional Data Analysis. J. R. Stat. Soc. Ser. B Methodol. 1991, 53, 539–561. [Google Scholar] [CrossRef]
  2. Ramsay, J.; Silverman, B. Functional Data Analysis; Springer: New York, NY, USA, 1997. [Google Scholar]
  3. Ramsay, J.; Silverman, B. Functional Data Analysis, 2nd ed.; Springer: New York, NY, USA, 2005. [Google Scholar]
  4. Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis; Springer: New York, NY, USA, 2006. [Google Scholar]
  5. Ferraty, F.; Vieu, P. Nonparametric Models for Functional Data, with Application in Regression, Time Series Prediction and Curve Discrimination. J. Nonparametr. Stat. 2004, 16, 111–125. [Google Scholar] [CrossRef]
  6. Ling, N.X.; Wu, Y.H. Consistency of Modified Kernel Regression Estimation with Functional Data. Statistics 2012, 46, 149–158. [Google Scholar] [CrossRef]
  7. Baíllo, A.; Grané, A. Local Linear Regression for Functional Predictor and Scalar Response. J. Multivar. Anal. 2009, 100, 102–111. [Google Scholar] [CrossRef] [Green Version]
  8. Fix, E.; Hodges, J. Discriminatory Analysis. Nonarametric Discrimination: Consistency Properties. Inter. Stat. Re. 1989, 57, 238–247. [Google Scholar] [CrossRef]
  9. Altman, N.S. An introduction to kernel and nearest-neighbor nonparametic regression. Am. Stat. 1992, 46, 175–185. [Google Scholar]
  10. Burba, F.; Ferraty, F.; Vieu, P. k-Nearest Neighbour Method in Functional Nonparametric Regression. J. Nonparametr. Stat. 2009, 21, 453–469. [Google Scholar] [CrossRef]
  11. Masry, E. Nonparametric Regression Estimation for Dependent Functional Data: Asymptotic Normality. Stoch. Pro 2005, 115, 155–177. [Google Scholar] [CrossRef] [Green Version]
  12. Laib, N.; Louani, D. Rates of strong consistencies of the regression function estimator for functional stationary ergodic data. J. Stat. Plan. Inference 2011, 141, 359–372. [Google Scholar] [CrossRef]
  13. Ezzahrioui, M.; Ould-Sad, E. Asymptotic Normality of a Nonparametric Estimator of the Conditional Mode Function for Functional Data. J. Nonparametr. Stat. 2008, 20, 3–18. [Google Scholar] [CrossRef]
  14. Ezzahriouia, M.; Ould-Said, E. Some Asymptotic Results of a Nonparametric Conditional Mode Estimator for Functional Time-Series data. Stat. Neerl. 2010, 64, 171–201. [Google Scholar] [CrossRef]
  15. Horvath, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer: New York, NY, USA, 2012. [Google Scholar]
  16. Ling, N.X.; Wang, C.; Ling, J. Modified Kernel Regression Estimation with Functional Time Series data. Stat. Probab. Lett. 2016, 114, 78–85. [Google Scholar] [CrossRef]
  17. Abdelmalek, G.; Abdelhak, C. Strong uniform consistency rates of the local linear estimation of the conditional hazard estimator for functional data. Int. J. Appl. Math. Stat. 2020, 59, 1–13. [Google Scholar]
  18. Mustapha, M.; Salim, B.; Ali, L. The consistency and asymptotic normality of the kernel type expectile regression estimator for functional data. J. Multivar. Anal. 2021, 181. [Google Scholar] [CrossRef]
  19. Kudraszow, N.L.; Vieu, P. Uniform Consistency of kNN Regressors for Functional Variables. Stat. Probab. Lett. 2013, 83, 1863–1870. [Google Scholar] [CrossRef]
  20. Kara, L.Z.; Laksaci, A.; Rachdi, M.; Vieu, F. Data-driven kNN Estimation in Nonparametric Functional Data Analysis. J. Multivar. Anal. 2017, 153, 176–188. [Google Scholar] [CrossRef]
  21. Joag-Dev, K.; Proschan, F. Negative Association of Random Variables with Application. Ann. Stat. 1983, 11, 286–295. [Google Scholar] [CrossRef]
  22. Yi, W.; Wang, X.; Sung, S.H. Complete Moment Convergence for Arrays of Rowwise Negatively Associated Random Variables and its Application in Non-parametric Regression Model. Probab. Eng. Inf. Sci. 2017, 32, 37–57. [Google Scholar]
  23. Delsol, L. Advances on Asymptotic Normality in Nonparametric Functional Time Series Analysis. Statistics 2009, 43, 13–33. [Google Scholar] [CrossRef]
  24. Ferraty, F.; Rabhi, A.; Vieu, P. Conditional Quantiles for Dependent Functional Data with Application to the Climatic El Niño Phenomenon. Sankhyā Indian J. Stat. 2005, 67, 378–398. [Google Scholar]
  25. Ferraty, F.; Laksaci, A.; Tadj, A.; Vieu, P. Rate of Uniform Consistency for Nonparametric Estimates with Functional Variables. J. Stat. Plan. Inference 2010, 140, 335–352. [Google Scholar] [CrossRef]
  26. Christofides, T.C.; Hadjikyriakou, M. Exponential Inequalities for N-demimartingales and Negatively Associated Random Variables. Stat. Probab. Lett. 2009, 79, 2060–2065. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Curve-sample with sample size of n = 330.
Figure 1. Curve-sample with sample size of n = 330.
Axioms 11 00102 g001
Figure 2. Prediction effects of the two estimation methods. (a) kNN estimation method. (b) NW estimation method.
Figure 2. Prediction effects of the two estimation methods. (a) kNN estimation method. (b) NW estimation method.
Axioms 11 00102 g002
Figure 3. Monthly mean SST factor decomposition fitting comprehensive output diagram.
Figure 3. Monthly mean SST factor decomposition fitting comprehensive output diagram.
Axioms 11 00102 g003
Figure 4. Time series curve of SST in El Ni n ˜ o during 31 years.
Figure 4. Time series curve of SST in El Ni n ˜ o during 31 years.
Axioms 11 00102 g004
Figure 5. Forecast value of SST in 2020 by KNN method and NW method.
Figure 5. Forecast value of SST in 2020 by KNN method and NW method.
Axioms 11 00102 g005
Table 1. The AMSE of the predicted response variables of the two methods under different sample sizes.
Table 1. The AMSE of the predicted response variables of the two methods under different sample sizes.
n200300500800
kNN- AMSE0.06230.04700.03270.0291
NW- AMSE0.27640.25930.23290.2117
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Hu, X.; Wang, J.; Wang, L.; Yu, K. K-Nearest Neighbor Estimation of Functional Nonparametric Regression Model under NA Samples. Axioms 2022, 11, 102. https://doi.org/10.3390/axioms11030102

AMA Style

Hu X, Wang J, Wang L, Yu K. K-Nearest Neighbor Estimation of Functional Nonparametric Regression Model under NA Samples. Axioms. 2022; 11(3):102. https://doi.org/10.3390/axioms11030102

Chicago/Turabian Style

Hu, Xueping, Jingya Wang, Liuliu Wang, and Keming Yu. 2022. "K-Nearest Neighbor Estimation of Functional Nonparametric Regression Model under NA Samples" Axioms 11, no. 3: 102. https://doi.org/10.3390/axioms11030102

APA Style

Hu, X., Wang, J., Wang, L., & Yu, K. (2022). K-Nearest Neighbor Estimation of Functional Nonparametric Regression Model under NA Samples. Axioms, 11(3), 102. https://doi.org/10.3390/axioms11030102

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop