Next Article in Journal
New Theorems in Solving Families of Improper Integrals
Previous Article in Journal
A New Survey of Measures of Noncompactness and Their Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Smooth Versions of the Mann–Whitney–Wilcoxon Statistics

1
Department of Mathematics, University of Lampung, Bandar Lampung 35141, Indonesia
2
Department of Statistics, Oklahoma State University, Stillwater, OK 74078, USA
*
Author to whom correspondence should be addressed.
Axioms 2022, 11(7), 300; https://doi.org/10.3390/axioms11070300
Submission received: 27 May 2022 / Revised: 6 June 2022 / Accepted: 15 June 2022 / Published: 21 June 2022
(This article belongs to the Section Mathematical Analysis)

Abstract

:
The well-known Mann–Whitney–Wilcoxon (MWW) statistic is based on empirical distribution estimates. However, the data are often drawn from smooth populations. Therefore, the smoothness characteristic is not preserved. In addition, several authors have pointed out that empirical distribution is often an inadmissible estimate. Thus, in this work, we develop smooth versions of the MWW statistic based on smooth distribution function estimates. This approach preserves the data characteristics and allows the efficiency of the procedure to improve. In addition, our procedure is shown to be robust against a large class of dependent observations. Hence, by choosing a rectangular array of known distribution functions, our procedure allows the test to be a lot more reflective of the real data.

1. Introduction

Suppose that X and Y are two independent random variables with the distribution functions (df’s) F and G, respectively. We can say that stochastically X Y if F ( x ) G ( x ) for all x. Testing H 0 : F ( x ) = G ( x ) for all x against the alternative H 1 : F ( x ) G ( x ) is carried out via the celebrated Mann–Whitney–Wilcoxon (MWW) statistics. Let X 1 ,   ,   X m and Y 1 ,   ,   Y n be two samples from F and G, respectively. The MWW statistics are defined by the following:
p m , n = ( m n ) 1 i = 1 m j = 1 n I { X i < Y j }  
where I A ( a ) = 1 if a A and 0 otherwise [1,2]. Note that p m , n is the empirical estimate of p = P ( X Y ) = F ( x ) d G ( x ) . Several generalizations of p m , n were discussed in the literature, with the aim of increasing the test efficiency, c.f. [3,4,5].
On the other hand, ref. [6] showed that F m ( x ) with respect to the integrated mean squared error (IMSE) is inadmissible. Therefore, extensive research has been carried out to find competing estimates of F ( x ) and F m ( x ) . Additionally, work on estimating the corresponding probability density function (pdf) began with the pioneering work of [7,8], who suggested that a smooth estimate of F ( x ) can be gained by integrating the so-called “kernel pdf estimates” defined by the following:
f ^ ( x ) = 1 m a m i = 1 m k ( x X i a m ) ,
where k ( · ) is a known pdf with the sequence of real numbers of { a m } , such that a m 0 as m . Hence, the kernel of estimate of F ( x ) is as follows:
F ^ m ( x ) = m 1 i = 1 m K ( x X i a m ) ,
where K ( u ) = k ( w ) d w .
A large number of studies have been carried out to investigate the properties of F ^ m ( x ) , including the works of [9,10,11,12,13], among many other authors.
The idea of comparing the efficiency of F ^ m ( x ) to that of F m ( x ) started with the work of [14], who showed that the relative deficiency of F m ( x ) to an appropriately selected F ^ m ( x ) quickly tended to ∞ as m when using the MISE as a criterion. This was followed by a large number of authors, among whom we mention [15,16,17], who showed the following:
E { F ^ m ( x ) F ( x ) } 2 d F ( x )                                                         = 1 6 m 2 a m m 1 C f 2 ( x ) d x + h 4 σ k 2 4 ( f ( x ) ) 2 f ( x ) d x + o ( a m / m + a m 4 ) ,
where C = t   k ( t )   K ( t ) d t and σ k 2 is the variance of k ( · ) . For further discussion, see the works by [18,19,20,21,22,23,24,25], among others. In cases where the data are censored, we refer to the works of [26,27].
A natural extension of Equation (3) can be defined as follows:
Let { W m } be a sequence of known df’s. We define an extended estimate of F ( x ) using the following:
F ^ m ( x ) = m 1 i = 1 m W m ( x X i ) ,
where W m ( u ) I ( , ) ( u ) as m . Note that the kernel df estimate takes W m ( u ) = K ( u a m ) .
Our smooth estimate of p is defined as follows: let { W m , n } be a rectangular array of known df’s satisfying the following:
W m , n ( u ) I ( , ) ( u )   as   min ( m , n )
We propose estimating p using the following:
p ^ m , n = W m , n ( y x ) d F m ( x ) d G n ( y ) = ( m n ) 1 i = 1 m j = 1 n W m , n ( Y j X i )
Note that we can write p ^ m , n as p ^ m , n = n 1 j = 1 n F ^ m ( Y j ) , where F ^ m ( x ) = m 1 and i = 1 m W m , n ( x X i ) . Examples of the W m , n ( u ) arrays include the following:
1.
W m , n ( u ) = 1 2   [ W m ( u ) + W n ( u ) ] ;
2.
W m , n ( u ) = W m ( u v ) d W n ( v ) ;  
3.
W m , n ( u ) = [ W m ( u ) W n ( u ) ] 1 / 2 .

2. Results

2.1. Large Sample Theory of p ^ m , n

(i) Asymptotic unbiasedness: if W m , n ( u ) I ( 0 , ) ( u ) as min ( m , n ) , then E p ^ m , n p .
Proof. 
Note that E p ^ m , n = W m n ( y x ) d F ( x ) d G ( y ) and let H m , n ( y ) = W m , n ( y x ) d F ( x ) . Thus, the characteristic function of H m , n ( y ) is as follows:
H m , n ( t ) = W m , n ( t ) F ( t ) ,
where W m , n ( F ) denotes the characteristic function of W m , n (F). Since W m , n ( t ) I ( 0 , ) ( x ) , W m , n ( t ) 1 as min ( m , n ) and, thus, H m , n ( t ) F ( t ) . Hence, H m , n ( · ) F ( · ) as min ( m , n ) at each continuity point of F. The Lebesgue dominated convergence theorem was applied to obtain the result. □
(ii) Weak (and L 1 ) consistency: if W m , n ( u )   I ( 0 , ) ( u ) as min ( m , n ) , then p ^ m , n p with the same probability as min ( m , n ) .
Proof. 
From Part (i), we only need to show that V ( p ^ m , n ) 0 as min ( m , n ) . We will show a stronger result for sufficiently large values of m and n, as follows:
V ( p ^ m , n ) 2 m y i < y 2 F ( y 1 ) [ 1 F ( y 2 ] d G ( y 1 ) d G ( y 2 ) + 2 n x i < x 2 G ( x 1 ) [ 1 G ( x 2 ] d F ( x 1 ) d F ( x 2 ) = σ m , n 2
Note that
V ( p ^ m , n ) = ( m n ) 1 E W m , n 2 ( Y 1 X 1 ) + [ ( m 1 ) / m n ] E W m , n ( Y 1 X 1 ) W m , n ( Y 1 X 2 ) + [ ( n 1 ) / m n ] E W m , n ( Y 1 X 1 ) W m , n ( Y 2 X 1 ) [ ( m + n 1 ) / m n ] E [ W m , n ( Y 1 X 1 ) ] 2 = A 1 + A 2 + A 3 + A 4
Since W m , n 2 ( u ) I ( 0 , ) ( u ) as min ( m , n ) , using an argument similar to Part (i), we can show that W m , n 2 ( y x ) d F ( x ) F ( y ) as min ( m , n ) at each continuity point y of F1. Thus, it follows that A 1 0 as min ( m , n ) . In fact, A 1 = o [ ( m n ) 1 ] . Next, for sufficiently large values of m and n, we can see the following by the same reasoning:
A 2 [ ( m 1 ) / m n ] W m , n ( y x 1 ) W m , n ( y x 2 ) d F ( x 1 ) d F ( x 2 ) d G ( y ) = n 1 G [ max ( x 1 , x 2 ) ] d F ( x 1 ) d F ( x 2 ) ,
For sufficiently large values of m and n, W m , n ( y x 1 ) W m , n ( y x 2 ) d   G ( y ) G [ max ( x 1 , x 2 ) ] . The proof for A 3 is similar. Finally, for A 4 [ ( m + n 1 ) / m n ] p 2 , the desired conclusion is reached by collecting terms. □
(iii) Strong consistency: if W m , n ( u )   I ( 0 , ) ( u ) as min ( m , n ) , then p ^ m , n p with a probability of one.
Proof. 
Since, according to Part (i), E p ^ m , n p as min ( m , n ) , we only need to look at [ p ^ m , n E p ^ m , n ] . However, since W m , n ( · ) is a distribution function, by integrating the parts, we obtain the following:
| p ^ m , n E p ^ m , n | = | W m , n ( y x ) d F m ( x ) d G n ( y ) W m , n ( y x ) d F ( x ) d G ( y ) | | W m , n ( y x ) d F m ( x ) d G n ( y ) W m , n ( y x ) d F m ( x ) d G ( y ) | + | W m , n ( y x ) d F m ( x ) d G ( y ) W m , n ( y x ) d F ( x ) d G ( y ) | = | [ F m ( x ) F ( x ) ] d W m , n ( y x ) d G n ( y ) | + | [ G n ( y ) G ( y ) ] d W m , n ( y x ) d F ( x ) | sup x | F m ( x ) F ( x ) | + sup y   | G n ( y ) G ( y ) | = o ( m l n   l n   m 1 2 ) + o ( n l n   l n   n 1 2 ) = o { min ln ln m m ,   ln ln n n   } = o ( 1 )
We use the standard law of iterated logarithm for empirical distribution functions. □
(iv) Asymptotic normality: if W m , n ( u )   I ( u ) , and if min ( m , n ) ( E p ^ m n p ) 0 as min ( m , n ) , ( p ^ m , n p ) / σ m , n is asymptotically standard normal, where σ m , n is as given in Equation (9).
Proof. 
Under the above conditions, it is sufficient to point out that ( p ^ m , n E p ^ m , n ) / σ m , n is asymptotically standard normal. To this end, we obtain the following:
p ^ m , n E p ^ m , n = { [ G n ( y ) G ( y ) ] d W m , n ( y x ) d F ( x ) [ F m ( x ) F ( x ) ] d W m , n ( y x ) d G ( y ) } + {   [ F m ( x ) F ( x ) ] d W m , n ( y x ) d [ G n ( y ) G ( y ) ] } = B + C
Clearly, B is the difference between the two independent sample averages; thus, we have proved that B is asymptotically normal and has a mean of zero with the variance σ m , n 2 . Additionally, note that E C = o ( 1 ) , and if m and n are large enough according to the methods in Part (i), then we obtain the following:
E C 2 ( m n ) 1 { [ F ( x ) ( 1 F ( x ) d G ( x ) ] 2 x i < x 2 F ( x 1 ) ( 1 F ( x 2 ) d h ( x 1 ) d h ( x 2 ) } = o [ ( m n ) 1 ] .
Hence, min ( m , n ) E C 2 = o [ ( min ( m , n ) ) 1 ] = o ( 1 ) , thus min ( m , n )   C 0 . The conclusion is now obtained. □
Some remarks:
(I) A sufficient condition for min ( m , n ) ( E p ^ m , n p ) 0 is that min ( m , n )   | t | α d W m , n ( t ) 0 as min ( m , n ) , provided that F is the Lipschitz of order 0 < α 1 .
Proof. 
Note that E p ^ m , n = E F ^ m , n ( y ) d G ( y ) , where E F ^ m , n ( y ) = W m , n ( y t ) d F ( t ) . Thus, by integrating the parts, the following are obtained:
| W m , n ( y x ) d F ( x ) d G ( y ) F ( x ) d G ( x ) | = | E F ^ m , n ( x ) d G ( x ) F ( x ) d G ( x ) |   | F ( x t ) F ( x ) d W m , n ( t ) d G ( x ) | C   ( t ) α d W m , n ( t ) = o [ ( min ( m , n ) ) 1 ] .
If we know that W m , n ( t ) = [ K ( t a m ) + K ( t b n ) ] / 2 with a known K(·) and k ( t ) = d d t K ( t ) , a m = C 1 m δ , C 1 > 0 and 1 2 < δ < 1 , and b n = C 2 n β , C 2 > 0 and 1 2 < β < 1 , then the above condition is met for α = 1 and δ = β . Of course, if W m , n ( u ) = I ( 0 , ) ( u ) , then min ( m , n ) ( E p ^ m n p ) = 0 .
(II) If one wishes to find an asymptotic confidence interval for p = P ( X < Y ) , a consistent estimate of σ m , n 2 is needed. However, this estimate can be easily obtained using the following:
σ ^ m , n 2 = 2 m y 1 < y 2 F ^ m ( y 1 ) ( 1 F ^ m ( y 2 ) ) d G n ( y 1 ) d G n ( y 2 ) + 2 n x 1 < x 2 G ^ n ( x 1 ) ( 1 G ^ n ( x 2 ) ) d F m ( x 1 ) d F m ( x 2 )
where F ^ m ( x ) = m 1 i = 1 m W m , n ( x X i ) and F m ( x ) is the empirical df of F, with G ^ n ( y ) and G n ( y ) defined analogously. Thus, we obtain the confidence bounds as follows:
p ^ m , n ± Z α / 2 σ ^ m , n / [ min ( m , n ) ] 1 / 2
In addition, note that σ m , n 2 [ min ( m , n ) ] 1 { y 1 < y 2 F ( y 1 ) ( 1 F ( y 2 ) ) d G ( y 1 ) d G ( y 2 ) + x 1 < x 2 G ( x 1 ) ( 1 G ( x 2 ) ) d F ( x 1 ) d F ( x 2 ) } = [ min ( m , n ) ] 1 σ 2 . Thus, in Part (iii) above, we can write that ( p ^ m , n p ) [ min ( m , n ) ] 1 / 2 / σ ^ m , n 2 is asymptotically standard normal.
(III) The kernel method provides an easy way to generate W m , n ( · ) , at least for the special cases W m , n ( u ) = 1 2   [ W m ( u ) + W n ( u ) ] and W m , n ( u ) = W m ( u v )   d W n ( v ) , by defining W m ( W n ) as a kernel estimate V i j , W m ( u ) = K ( u a m ) ,   W n ( u ) = K ( u b n ) , where K is a random df and { a m } ( { b n } ) is a sequence of constants, such that a m 0 as m ( b n 0 as n ) . As is now well known in the literature, the choices of a m and b n will, generally speaking, depend on the data. Thus, in such cases, the conditions on W m , n ( u ) will have to be adjusted.
(IV) Mann–Whitney–Wilcoxon statistics for paired data. ( X 1 , Y 1 ) , , ( X n , Y n ) is drawn from a bivariate df F(x, y). In this case, we may be interested in estimating either (i) P ( X 1 < Y 1 ) or (ii) P ( X 1 < Y 2 ) . We propose estimating p 1 = P ( X 1 < Y 1 ) using p ^ 1 = W n ( x y ) d F n ( x , y ) = n 1   i = 1 n W n ( X i Y i ) , where { W n } is a sequence of df’s converging to I ( 0 , ) ( · ) as n . Note that p ^ 1 is an average of independent random variables. Thus, one can, without much difficulty, study its properties. We have left in the details for readers who are interested. If it is necessary to estimate p 2 = P ( X 1 < Y 2 ) , we propose estimating p ^ 2 = n ( n 1 ) i j W n ( Y j X i ) . Thus, p ^ 2 = p ^ n p ^ 1 , where p ^ n = W n ( x y ) d F n ( x ) d G n ( y ) = n 2 i j W n ( Y j X i ) . The asymptotic properties of p ^ 2 are not trivial to deduce, but can be obtained by the methods described in this paper. The consistency (weak and strong) of p ^ 2 when we have a random sample ( X 1 , Y 1 ) , , ( X n , Y n ) is obtained under the condition that W n ( · ) I ( o , ) ( · ) as n . The asymptotic normality is obtained with the following approximate variance:
σ 2 = G 2 ( x ) d F ( x ) + ( 1 F ( x ) ) 2 d G ( x ) + 2 G ( x ) ( 1 F ( x ) ) d H ( x , y ) 4 p ^ 2 2
To estimate σ m , n 2 , we propose the following:
σ ^ m , n 2 = G n 2 ( x ) d F m ( x ) + ( 1 F m ( x ) ) 2 d G n ( x ) + 2 G n ( x ) ( 1 F m ( x ) ) d H n ( x , y ) 4 p ^ 2 2
Thus, n ( p ^ 2 p 2 ) / σ ^ m , n is asymptotically normal provided that n   ( E   p ^ 2 p 2 ) 0 as n .

2.2. Robustness of p ^ m n against Dependence

In this section, we assume that X 1 ,   ,   X m ( Y 1 ,   ,   Y n ) denotes the first m(n) units in the sequence { X m } ( { Y n } ) , satisfying the following strong mixing condition c.f. [28]. Let a b denote the σ-field generated by X a ,   ,   X b , then { X m } is said to be strong mixing if there is a function with the integer value α ( · ) , such that α ( m ) 0 as m , and the following is obtained:
| P ( A B ) p ( A ) p ( B ) | α ( m )
For all A ,   a   B a + m   . Throughout this section, we shall assume that { X m } ( { Y n } ) are strictly stationary. To establish the results of this section, we need some definitions.
Let U m , n ( X 1 ) = E [ W m , n ( Y 1 X 1 ) |   X 1 ] and V m , n ( Y 1 ) = E [ W m , n ( Y 1 X 1 ) |   Y 1 ] . The following are obtained:
R m , n = [ min ( m , n ) ] 1 / 2   { m 1 i = 1 m U m , n ( X i ) + n 1 j = 1 n V m , n ( Y j )   ( m n ) 1 i = 1 m j = 1 n W m , n ( Y j X i ) E W m , n ( Y j X i ) }
The next lemma is instrumental in the development of this section.
Lemma 1. 
If m = 1 α ( m ) < and n = 1 α ( n ) < , then E R m , n 0 as m i n ( m , n ) .
Proof. 
For 1 i m and 1 j n , the following is obtained:
φ m , n ( i , j ) = U m , n ( X i ) + V m , n ( Y j ) W m , n ( Y j X i ) E W m , n ( Y j X i ) .
Then, we can easily see the following:
E R m , n 2 = min ( m , n ) ( m n ) 2 E { i j φ m , n ( i , j ) } 2 = min ( m , n ) ( m n ) 2 { i j E φ m , n 2 ( i , j ) + i j j * E φ m , n ( i , j ) φ m , n ( i , j * ) } + i i * j E φ m , n ( i , j ) φ m , n ( i * , j ) + i i * j j * E φ m , n ( i , j ) φ m , n ( i * , j * ) .
We shall consider each term alone. Since | φ ( i , j ) | 2 , the first sum of the order [ min ( m , n ) / m n ] 0 as min ( m , n ) . From now on, we drop the m and n suffix from φ . Next, by the Lemma of [28], since { X i } and { Y j } are strictly stationary, we can see the following:
| i j j * E φ ( i , j ) φ ( i , j * ) | 2 i j j * | E φ ( i , j ) φ ( i , j * ) | = 2 i j ( n j + 1 ) | E φ ( i , j ) φ ( i , j * ) | C   m j = 1 n ( n j + 1 ) α ( j ) = C   m n j = 1 n ( n j + 1 n ) α ( j ) C   m n [ j = 1 α ( j ) 1 n j = 1 n j   α ( j ) ]
Thus, the second sum in Equation (20) is bounded above by the following:
C [ min ( m , n ) m n ] [ j = 1 α ( j ) 1 n j = 1 n j   α ( j ) ] ,
which converges to zero, since j α ( j ) < . Thus, by Kronecker’s lemma, n 1 j j   α ( j ) 0 as n . In a similar way, we can show that the third term in Equation (20) converges to zero as min ( m , n ) . Finally, the fourth term is less than or equal to
4 i < i * j < j * | E φ ( i , j ) φ ( i * , j * ) | = 4 i = 1 m j = 1 n ( m i + 1 ) ( n j + 1 ) | E φ ( 1 , 1 ) φ ( i + 1 , j + 1 ) | .
However, since | E φ ( 1 , 1 ) φ ( i + 1 , j + 1 ) | C   α ( i ) and | E φ ( 1 , 1 ) φ ( i + 1 , j + 1 ) | C   α ( j ) (note that here and elsewhere, C denotes a generic positive constant that is not necessarily the same from place to place), | E φ ( 1 , 1 ) φ ( i + 1 , j + 1 ) | C   α ( max ( i , j ) ) . Thus, the last term in Equation (20) is less than or equal to
C   i = 1 m j = 1 n ( m i + 1 ) ( n j + 1 )   α ( max ( i , j ) ) C   m n i j   α ( max ( i , j ) )                                         = C   m n   { i j = 1 i α ( i ) + i = 1 m j = i + 1 n α ( j ) }   C   m n   { i = 1 m i   α ( i ) + i = 1 m [ j = i + 1 α ( j ) ] }                                         = o (   m 2 n )
To show Equation (23), let k m be an n integer, such that k m < m and k m as m and k m = o ( m ) . Then, the following is obtained:
i = 1 m i   α ( i ) = i = 1 k m i   α ( i ) + i = k m + 1 m i   α ( i ) k m 1 α ( i ) + m [ i = k m   α ( i ) ] = o ( k m ) + m   o   ( 1 ) = o ( m ) .
Next, since 1 α ( m ) < , i = 1 m [ j = i + 1 α ( j ) ] = o ( m )   as m . Thus, Equation (23) is proved, as is the lemma. □
In light of Lemma 1, the consistency and asymptotic normality of p ^ m , n can be analyzed just by looking at m 1 i = 1 m U m , n ( X i ) E W m , n ( Y 1 X 1 ) and n 1 j = 1 n V m , n ( Y j ) E W m , n ( Y 1 X 1 ) . In addition, note that E U m , n ( X 1 ) = E V m , n ( Y 1 ) = E W m , n ( Y 1 X 1 ) . Now, by stationarity, the following is obtained:
V [ m 1 i = 1 m U m , n ( X i ) ] = m 1 { i = 1 m V   ( U m , n ( X i ) ) + i i * c o v ( U m , n ( X i ) ,   U m , n ( X j * ) } = m 1   V   ( U m , n ( X i ) ) + 2 m 2 i = 1 m ( m i + 1 )   c o v ( U m , n ( X i ) ,   U m , n ( X i + 1 ) )
Similarly, we can express V ( n 1 ( j = 1 n V m . n ( Y j ) ) . Thus, we obtain the following for sufficiently large values of m and n:
V [ m 1 i = 1 m U m , n ( X i ) + n 1 j = 1 n V m , n   ( Y j ) ] σ m , n 2 + 2 m 2 i = 1 m ( m i + 1 )   c o v ( U m , n ( X 1 ) ,   U m , n ( X i + 1 ) ) + 2 n 2 j = 1 n ( n j + 1 )   c o v ( V m , n ( Y 1 ) ,   V m , n ( Y j + 1 ) )
Let us find large sample values for the covariances. For large values of m and n, the following is obtained:
c o v ( U m , n ( X 1 ) ,   U m , n ( X i + 1 ) ) = U m , n ( X 1 ) ,   U m , n ( X i + 1 ) ) d F i ( x 1 , x i + 1 ) [ E W m , n ( Y 1 X 1 ) ] 2 [ F i ( x 1 , x i + 1 ) F ( x 1 ) F ( x i + 1 ) ] d G ( x i ) d G ( ( x i + 1 ) = σ 1 i ( x ) ,
where F i ( · , · ) denotes the joint df of ( X 1 , X i + 1 ) . This is similar for the second covariance term. Hence, for m and n values that are sufficiently large, the following is obtained:
V ( p ^ m n ) σ m , n 2 + 2 m 2 i = 1 m ( m i + 1 )   σ 1 i ( X ) + 2 n 2 j = 1 n ( n j + 1 )   σ 1 j ( Y ) = m , n 2 ,   say .
Now, let us write m , n 2 = m , n 2 ( X ) + m , n 2 ( Y ) , where m , n 2 ( X )   [ m , n 2 ( Y ) ] is the approximate variance of m 1 i = 1 m U m , n ( X i ) [ n 1 j = 1 n V m , n ( Y j ) ] . Note that | U m , n ( X i ) | 1 and | V m , n ( Y j ) | 1 . Thus, by Lemma 2.1 of [29], the following is obtained:
P [ | m 1 i = 1 m U m , n ( X i ) E U m , n ( X i ) | t ] 2 ( 1 + K   α ( M ) ) p e P t 2 2
where P = p ( m ) and M = M ( m ) are integer-valued functions, such that 2 P M m .
Now we can state and prove the properties of p ^ m n if samples are drawn from strictly stationary strong mixing processes.
(i) Weak consistency: if m   α ( m ) < and n   α ( n ) < , and if W m , n ( u ) I ( 0 , ) ( u ) as min ( m , n ) , then p ^ m , n   p with the same probability as min ( m , n ) .
Proof. 
This directly follows the fact that p ^ m , n E   p ^ m , n = ( U ¯ m , n ( X ) E   p ^ m , n ) + ( V ¯ m , n ( Y ) E p ^ m , n ) + R m , n / min ( m , n ) , and by applying the law of large numbers for Ergodic sequences, ( R m , n / min ( m , n ) ) 0 as min ( m , n ) , according to Lemma 1. □
(ii) Strong consistency: Assume that the conditions of Part (i) hold and, in addition, assume that there are integer-valued functions M(m) and P ( m ) ( N ( n )   and   Q ( n ) ) , such that 2 M P m ( 2 N Q n ) and for any t > 0 ,   m = 1 [ 1 + C   α ( M ) ] P e P t 2 / 2 < [ n = 1 ( 1 + C   α ( N ) ] Q e Q t 2 / 2 < , then p ^ m , n p with a probability of one as min ( m , n ) .
Proof. 
Again, from Lemma 1, E [ R m , n 2 min ( m , n ) ] = o [ ( min / m , n ) 1 ] γ for some 1 > γ > 0 (by choosing based on the proof of that lemma, R m = m γ , 1 > γ > 0 ) . Thus, with ʋ = min ( m , n ) and by writing R ʋ for R m , n , we can see that for any ε > 0 , P [ | R ʋ | > ε ] < ; thus, R ʋ 0 with a probability of one as ʋ . The conclusion is obtained by applying Equation (27) to U m , n ( X i ) s and V m , n ( Y j ) s . □
(iii) Asymptotic normality: note that we obtain the following for large values of m, n:
V a r ( p ^ m , n )   [ min ( m , n ) ] 1 { σ 2 + γ ( X ) + δ ( Y ) } = [ min ( m , n ) ] 1 σ * 2
where γ ( X ) = lim min ( m , n ) 2 m   i = 1 m ( m i + 1 ) σ 1 i ( X ) . Similarly, we define δ ( Y ) , provided, of course, that limits exist. In this case, we can write that min ( m , n )   ( p ^ m , n p ) / σ * is asymptotically standard normal.

3. Discussion

This study has proven that the smooth version of Mann–Whitney–Wilcoxon statistics is robust against a large class of dependent observations, and can be used if the data are drawn from a smooth population. These procedures, which are based on a smooth distribution function, can maintain the nature of the data and allow the efficiency of the procedure to be increased. In addition, when selecting a rectangular array of known distribution functions, the smooth version of MWW statistics allows much better testing to be conducted. For future work, we shall apply MWW statistics to simulation data and real data.

Author Contributions

Conceptualization, N.H. and I.A.A.; methodology, N.H.; validation, I.A.A.; writing—original draft preparation, N.H. and I.A.A.; writing—review and editing, I.A.A.; supervision, I.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Mann, H.B.; Whitney, D.R. On a test whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 1947, 18, 50–60. [Google Scholar] [CrossRef]
  2. Wilcoxon, F. Individual comparisons by ranking methods. Biom. Bull. 1945, 1, 80–83. [Google Scholar] [CrossRef]
  3. Ahmad, I.A. A class of Mann-Whitney-Wilcoxon type statistics. Am. Stat. 1996, 50, 324–327. [Google Scholar]
  4. Priebe, C.E.; Cowen, L.J. A Generalized Wilcoxon–Mann–Whitney statistic. Comm. Stat. Theory Methods 1999, 28, 2871–2878. [Google Scholar] [CrossRef]
  5. Öztürk, Ö. A generalization of Ahmad’s class of Mann-Whitney-Wilcoxon statistics. Aust. N. Z. J. Stat. 2001, 43, 67–74. [Google Scholar] [CrossRef]
  6. Read, R.R. The Asymptotic inadmissibility of the sample distribution function. Ann. Math. Stat. 1972, 43, 89–95. [Google Scholar] [CrossRef]
  7. Rosenblatt, M. Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 1956, 27, 832–837. [Google Scholar] [CrossRef]
  8. Parzen, E. On Estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
  9. Watson, G.S. Smooth regression analysis. Sankhya Indian J. Stat. Ser. A 1964, 26, 359–372. [Google Scholar]
  10. Nadaraya, E.A. Some new estimates for distribution functions. Theory Probab. Appl. 1964, 9, 497–500. [Google Scholar] [CrossRef]
  11. Yamato, H. Uniform convergence of an estimator of a distribution function. Bull. Math. Stat. 1973, 15, 69–78. [Google Scholar] [CrossRef]
  12. Winter, B.B. Strong uniform consistency of integrals of density estimators. Can. J. Stat. 1973, 1, 247–253. [Google Scholar] [CrossRef]
  13. Winter, B.B. Convergence rate of perturbed empirical distribution functions. J. Appl. Probab. 1979, 16, 163–173. [Google Scholar] [CrossRef]
  14. Reiss, R.D. Nonparametric estimation of smooth distribution functions. Scand. J. Stat. 1981, 8, 116–119. [Google Scholar]
  15. Falk, M. Relative efficiency and deficiency of kernel type estimators of smooth distribution functions. Stat. Neerland. 1983, 37, 73–83. [Google Scholar] [CrossRef]
  16. Jones, M.C. The performance of kernel density functions in kernel distribution function estimation. Stat. Probab. Lett. 1990, 9, 129–132. [Google Scholar] [CrossRef]
  17. Swanepoel, J.W.H. Mean integrated squared error properties and optimal kernels when estimating a distribution function. Comm. Stat. Theory Methods 1988, 17, 3785–3799. [Google Scholar] [CrossRef]
  18. Wang, S. Nonparametric estimation of distribution functions. Metrika 1991, 38, 259–267. [Google Scholar] [CrossRef]
  19. Ralescu, S.S. A Remainder estimate for the normal approximation of perturbed sample quantiles. Stat. Probab. Lett. 1992, 14, 293–298. [Google Scholar] [CrossRef]
  20. Shirahata, S.; Chu, I. Integrated squared error of kernel-type estimator of distribution function. Ann. Inst. Stat. Math. 1992, 44, 579–591. [Google Scholar]
  21. Berg, A.; Politis, D. CDF and survival function estimation with infinite-order kernels. Electron. J. Stat. 2009, 3, 1436–1454. [Google Scholar] [CrossRef]
  22. Leblanc, A. On estimating distribution functions using Bernstein polynomials. Ann. Inst. Stat. Math. 2012, 64, 919–943. [Google Scholar] [CrossRef]
  23. Tenreiro, C. Boundary kernels for distribution function estimation. REVSTAT-Stat. J. 2013, 11, 169–190. [Google Scholar]
  24. Bouredji, H.; Sayah, A. Bias Correction at End Points in Kernel Density Estimation. In Proceedings of the International Conference on Advances in Applied Mathematics (ICAAM-2018), Sousse, Tunisia, 17–20 December 2018. [Google Scholar]
  25. Oryshchenko, V. Exact mean integrated squared error and bandwidth selection for kernel distribution function estimators. Comm. Stat. Theory Methods 2020, 49, 1603–1628. [Google Scholar] [CrossRef] [Green Version]
  26. Ghorai, J.K.; Susarla, V. Kernel estimation of a smooth distribution function based on censored data. Metrika 1990, 37, 71–86. [Google Scholar] [CrossRef]
  27. Alevizos, F.; Bagkavos, D.; Ioannides, D. Efficient estimation of a distribution function based on censored data. Stat. Probab. Lett. 2019, 145, 359–364. [Google Scholar] [CrossRef]
  28. Ibragimov, I.A.; Linnik, Y.V. Independent and Stationary Sequences of Random Variables; Wolters-Noordhoff: Groningen, The Netherlands, 1971; pp. 202–205. [Google Scholar]
  29. Ahmad, I.A. Strong consistency of density estimation by orthogonal series methods for dependent variables with applications. Ann. Inst. Stat. Math. 1979, 31, 279–288. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Herawati, N.; Ahmad, I.A. Smooth Versions of the Mann–Whitney–Wilcoxon Statistics. Axioms 2022, 11, 300. https://doi.org/10.3390/axioms11070300

AMA Style

Herawati N, Ahmad IA. Smooth Versions of the Mann–Whitney–Wilcoxon Statistics. Axioms. 2022; 11(7):300. https://doi.org/10.3390/axioms11070300

Chicago/Turabian Style

Herawati, Netti, and Ibrahim A. Ahmad. 2022. "Smooth Versions of the Mann–Whitney–Wilcoxon Statistics" Axioms 11, no. 7: 300. https://doi.org/10.3390/axioms11070300

APA Style

Herawati, N., & Ahmad, I. A. (2022). Smooth Versions of the Mann–Whitney–Wilcoxon Statistics. Axioms, 11(7), 300. https://doi.org/10.3390/axioms11070300

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop