1. Introduction
The estimation of a probability density from independent and identically distributed (i.i.d.) random observations
of
X is a classical problem in statistics. The representative work is Donoho et al. [
1], they established an adaptive and nearly-optimal estimate (up to a logarithmic factor) over Besov spaces using wavelets.
However, the observed data are always polluted by noises in many real-life applications. One of the important problems is the density estimation with an additive noise. Let
be i.i.d. random variables and have the same distribution as
where
X denotes a real-valued random variable with unknown probability density function
f and
Y stands for an independent random noise (error) with a known probability density
g. The problem is to estimate
f by
in some sense. Moreover, it is also called a denconvolution problem (model), because the density
h of
Z equals the convolution of
f and
g. Fan and Koo [
2] studied the MISE performance (
-risk) of linear wavelet deconvolution estimator over a Besov ball. The
risk optimal wavelet estimations were investigated by Lounici and Nickl [
3]. Furthermore, Li and Liu [
4] provided
risk optimal deconvolution estimations using wavelet bases.
In this paper, we consider a generalized deconvolution model introduced by Lepski & Willer [
5,
6]. More precisely, let
be a probability space and
be i.i.d. random variables having the same distribution as
where the symbols
X and
Y are same as model (
1),
f and
g are the corresponding densities respectively. Moreover, the biggest difference with model (
1) is that a Bernoulli random variable
with
is added in (
2), and
is known. The problem is also to estimate
f by the observed data
in some sense.
When
, model (
2) reduces to the deconvolution one (see [
2,
3,
4,
7,
8] et al.), while
corresponds to the classical density model with no errors (see [
1,
9,
10,
11] et al.). Clearly, the density function
h of
Z in (
2) satisfies
Here,
stands for the convolution of
f and
g. Furthermore, when the function
for
, we have
where
g,
are known and
is the Fourier transform of
given by
Based on the model (
2) with some mild assumptions on
, Lepski and Willer [
5] provided a lower bound estimation over
risk on an anistropic Nikol’skii space. Moreover, they investigated an adaptive and optimal
estimate by using kernel method in Ref. [
6]. Recently, Wu et al. [
12] established a pointwise lower bound estimation for model (
2) under the local Hölder condition.
When compared with the classical kernel estimation of density functions, the wavelet estimations provide more local information and fast algorithm [
13]. We will consider the
risk estimations under the model (
2) over Besov balls by using wavelets and expect to obtain the corresponding convergence rates.
The same as Assumption 4 in [
6], we also need the following condition on
Y,
with
for
and
for others. It is reasonable, because it holds automatically for
, while the same condition for
is necessary for the deconvolution estimations [
4,
7]. In addition, when
,
and
thanks to
. In fact, the condition (
3) is necessary to prove Lemmas 2 and 3 in
Section 2. Here and after,
denotes
for a fixed constant
;
means
;
stands for both
and
.
It is well-known that the wavelet estimation depends on an orthonormal wavelet expansion in
, even in
. Let
be a classical Multiresolution Analysis of
with scaling function
and
being the corresponding wavelet. Subsequently, for
where
and
. A scaling function
is called
m-regular
, if
and
for each
(
). Clearly, the
m-regularity of
implies that of the corresponding
, and
due to the integration by parts. An important example is Daubechies’ function
with
N large enough.
As usual, let
be the orthogonal projection from
onto
,
If
is
m-regular, then
is well-defined for
. Moreover, the identity (
4) holds in
for
.
The following lemma is needed for later discussions.
Lemma 1 ([
13])
. Let ϑ be an orthogonal scaling function or a wavelet satisfying m-regularity. Subsequently, there exist , such that, for and , One of the advantages of wavelet bases is that they can characterize Besov spaces, which contain the -Sobolev spaces and Hölder spaces as special examples.
Proposition 1 ([
13])
. Let scaling function φ be m-regular with and ψ be the corresponding wavelet. Afterwards, for and , the following conditions are equivalent:- (i).
;
- (ii).
; and,
- (iii).
The Besov norm can be defined by
When
and
, it is well-known that
- (1)
for ;
- (2)
for and ,
where stands for a Banach space A continuously embedded in another Banach space B. More precisely, holds for some .
In this paper, we use the notation
with some constants
to stand for a Besov ball, i.e.,
Next, we will estimate
f with
risk by constructing wavelet estimators from the observed data
. To introduce wavelet estimators, we take
having compact support and
m-regularity with
in this paper. Moreover, denote
where
and
is defined by the way. Clearly,
due to the Plancherel formula. Subsequently, the linear wavelet estimator is given by
where
. In particular, the cardinality of
satisfies that
, when
f and
have compact supports.
Now, we are in a position to state the first result of this paper.
Theorem 1. For and , the estimator in (7) with satisfies where and .
Remark 1. When , , the conclusion of Theorem 1 reduces to Theorem 3 of Li & Liu [4]. Note that the estimator
is non-adaptive, because the choice of
depends on the unknown parameter
s. To obtain an adaptive estimate, define
Here,
and the constants
will be determined later on. Subsequently, the non-linear wavelet estimator is defined by
where
,
are positive integers satisfying
and
respectively. Clearly,
and
do not depend on the unknown parameters
, which means that the estimator
in (
9) is adaptive.
Theorem 2. Let and . Then the estimator in (9) satisfies where
Remark 2. When , and , the convergence rate of Theorem 2 coincides with that of Theorem 3 in Donoho et al. [1]. On the other hand, and for the case , while the conclusion of Theorem 4 in Li & Liu [4] can follow directly from this theorem. Remark 3. When comparing the result of Theorem 2 with Theorem 1, we find easily that for the case , the convergence rate of non-linear estimator is better than that of the linear one with and .
Remark 4. The convergence rates of Theorem 2 with the cases and are nearly-optimal (up to a logarithmic factor) by Donoho et al. [1] and Li & Liu [4] respectively. However, it is not clear whether the estimation in Theorem 2 is optimal (nearly-optimal) or not for . Therefore, one of our future work is to determine a low bound estimate for model (2) with . This problem may be much more complicated than the cases of and .