1. Introduction
The product of two normally distributed variables was studied in the early years of the 20th Century [
1,
2,
3]. Although there is no exact expression for the product, authors produce several expressions for different situations. The results do not follow the parameters of any known distribution, but there exist some results for some specific cases [
2,
3,
4]. However, these approaches are only for some specific values since we can only apply them to some cases of the product, and we do not yet have a general expression for that product. In 1981, a book with tables for several cases was edited by AMS [
5]. The most recent study of this problem [
6] establishes the exact formula for the PDF of the product of two correlated standard normal random variables using an order zero-second kind modified Bessel function.
Another previous work tried to use Rohatgi’s theorem [
7] to obtain an exact distribution of the product of two uncorrelated Normally Distributed Random Variables; the authors developed an expression of the integral using an infinite series expansion and modified Bessel functions of the second kind [
8,
9]. Although there are several possibilities to determine the PDF of the product of two normally uncorrelated distributed variables, there is not a closed expression of it. We only have partial approximations [
2,
3] and some algorithms [
10] with limited use of general formulas that we have to calculate numerically [
6,
8]. More recently, [
11,
12,
13] proposed a study of the product of zero-mean correlated random variables.
The PDF of the product of two normal variables is still an open problem. The most common approach is using the Monte Carlo Simulation. Our approach is based on the previous work of [
9], but the authors propose a general expression for the product of two random variables, with limited application to normal variables. We have continued this work and we have generalized this approach to the whole product of two uncorrelated normal variables, obtaining a good approximation of the PDF using numerical integration.
We used the analysis development in [
9] and we have written a function, using the software R (
https://www.r-project.org/, accessed on 31 July 2023), that calculates the PDF of the product of two normally distributed variables using Rohatgi’s theorem. To study the precision of our function, we present the graphics of these PDFs and compare these results with a Monte Carlo Simulation of the product of the same variables.
To the best of our knowledge, the calculation of the PDF of the product of two normal variables is only performed via partial approximations for concrete examples or expressions involving integrals that do not have a closed form. This function, designed for the R program, allows one to quickly and easily calculate the estimated value of the PDF of the product of any two uncorrelated normal variables. Although the formula works well in general, it may have some bugs or errors when the value of the product is very close to zero.
2. The Product of Two Normal Random Variables
Let
X and
Y be two continuous random variables with joint PDF
. When the independence of the two variables is considered, the PDF of
Z can be defined as:
And then, the PDF of Z will be defined on a rectangular product space. In this paper, we consider that X and Y are independent variables.
In [
9], a case of Rohatgi’s Theorem is considered for determining the PDF of the product of two independent random variables. They describe the product
, where
X is a random variable
on the interval
with
and
Y is defined as
in an interval
with
. Then,
Z,
X and
Y are in the first quadrant. Following this, the theorem is defined.
Theorem 1. Let X be a random variable of continuous type with PDF , which is defined and positive on the interval , where . Similarly, let Y be a random variable of the continuous type with PDF , which is defined and positive on the interval , where .
The PDF of is divided into three sections:
- 1.
- 2.
- 3.
This theorem can be generalised for variables that lie entirely in the other quadrants. (II, III and IV). When the variable contains the value 0, it is more difficult since some of the rectangular product spac es would lie in two quadrants or not be entirely inside one of them. One solution would be to consider 0 at an endpoint of the interval. This approach results in more efficiency than other approaches. In [
14], a formula for calculating the PDF of the product of
n uniform independently and identically distributed random variables in the interval
is presented, using mostly Laplace and Merlin transformation techniques. This procedure was improved in [
15] to consider non-identically distributed variables. More recently, in [
16], the authors use Fourier analysis to produce the PDF of a product of
n independent and identically distributed uniform random variables in the interval
.
We developed a function for R [
17] that is an implementation of the procedure described in [
9] for two normally independently distributed variables,
and
.
We have to adapt the range of the normal distribution to a bounded range to be able to use the function defined in R. Thus, for the normal distribution, we have to consider two limits for the range of the variable. Fortunately,
of the range of the normal distribution is within the interval
. When 0 is in the range of some of the two normally distributed variables, we have to adopt a strategy to avoid the value 0, because the Expression (
1) is not defined when
. We can limit the value of the normal variable to two values near zero; for example, we can use a limit value
. The use of numerical integration requires the definition of an interval and a step between two points and then we have to fix a barrier to use values of
x too near to zero because they produce an error and misvalue results.
The limits of integration are in the range . If this interval is not only positive or negative, we must consider two intervals: and . If the normal variable is quite concentrated near 0, we can use a scale effect to avoid these values.
We developed a function that calculates the PDF of n points of the product of two normal uncorrelated variables and , where , , and .
The function, named “product.Normal”, has five parameters: the mean and variance of the two normal functions involved in the product () and n, that is, the step between two consecutive points. The result of the function is a list of numbers that represents several values of the PDF of the product into the range of the variable . To calculate the results, we considered six cases (or scenarios) and defined six internal procedures (“product.normal”) where is the number of the scenario to be applied. (See Listing 1).
The different scenarios were motivated by the range of values of the two normal variables considered: only positive, only negative or mixed values. In the last case, the value zero is in the range of the normal variable.
Listing 1. A Function to calculate the PDF of the product of two normally uncorrelated distributed variables—R code. |
|
In this function, we have the following variables: n is the step to consider between two consecutive points (values like 10 or 100 are the most common), is the mean of the X variable, is the mean of the Y variable and and are the variance of the two normally distributed variables X and Y, respectively.
This function considers the values of the two uncorrelated normal variables in the range and and considers the product in the range , where and are a function of the values of . The function develops a vector with values of the PDF of the product Z.
There are 6 Scenarios:
Scenario 1—When . In this case, the PDF of the product contains only positive values.
Scenario 2—When . In this case, the PDF of the product contains only positive values.
Scenario 3—When and . In this case, the PDF of the product contains only negative values.
Scenario 4—When . The X variable has positive and negative values and contains a value of zero. The Y variable is only positively valued. We have to divide the range of X into two sub-ranges, and , to avoid the zero value since, at this point, the integrals are not defined.
Scenario 5—When . The X variable has positive and negative values and contains a value of zero. The Y variable is only negatively valued. We have to divide the range of X into two sub-ranges and to avoid the zero value since at this point the integrals are not defined.
Scenario 6—When and . Then, we have to consider several subranges for X and Y and estimate the values of the PDF of the product considering the following sub-ranges: , , and .
Using the values calculated with our function, we could obtain the mean, variance, skewness and kurtosis of the estimated function of the product of the two normal variables.
Although we do not know the distribution function of the product, we can calculate the statistics of the distribution—mean, variance, skewness and kurtosis—using the moment-generating function (MGF) of the product. The MGF of
is given by
Then, we have
3. Results
We used the function to estimate the PDF of the product of two uncorrelated normal variables X and Y. The range of each variable was computed as a function of . In cases where the variable took the null value, only values of were considered.
The procedure implemented in R was used to calculate and represent the product density function (red/dark grey). To make a comparison, a Monte Carlo Simulation with points of the same two normal variables was used and their density function was plotted on the same graph (green/light grey). As we did not know the shape of the distribution of the product, we could simulate the product using the Monte Carlo Simulation. We considered a large number of products and drew a histogram of the points to obtain an estimation of the shape of the distribution.
A comparative table of the values of the product statistics—mean, variance, skewness and kurtosis—has also been included. Three values were considered: the theoretical values of these statistics, which were calculated using the moment-generating function; the Monte Carlo Simulation, where the values of the statistics were calculated based on the values obtained in the Simulation process; and the values obtained for the density function calculated using Rohatgi’s theorem.
3.1. Scenarios 1, 2 and 3
For Scenario 1, we considered three examples of two positive normal variables, with rank and and three different situations with , with and :
- Example 1 (a): :
and .
- Example 1 (b): :
and .
- Example 1 (c): :
and .
The graphical result of Example 1 (a) is in
Figure 1. We can observe a very good approach for the Simulation of the product (using the Monte Carlo Simulation) and the estimation of the product using the R function defined in the previous section. The two approaches have the same shape and there is a very strong coincidence between them. This graphic and the following graphics in the paper try to show the coincidence of the shapes of the two approximations, but they are not proof of the goodness of the approximation to the real shape of the product.
In
Table 1, we compare the values of the statistics mean, variance, skewness and kurtosis (excess) for the theoretical values of the product distribution (first line), Monte Carlo Simulation (second line) and Rohatgi’s Theorem approximation (third line). We added the bias and the relative bias of the numerical approach values and the theoretical values of the product distribution calculated using the expression of the mean, variance, skewness and kurtosis defined in the previous section.
For Scenario 2, we considered three examples of two negative normal variables, with rank and and , with and :
- Example 2 (a): :
and .
- Example 2 (b): :
and .
- Example 2 (c): :
and .
In
Table 2, we compare the values of the statistics mean, variance, skewness and kurtosis (excess) for the theoretical values of the product distribution (first line), Monte Carlo Simulation (second line) and Rohatgi’s Theorem approximation (third line). As we saw in Scenario 1, the values of the mean and variance statistics are in full agreement, and only in the case of skewness and kurtosis are there small differences, which, in general, can be considered as non-significant.
In Scenario 3, we considered two normally distributed variables; one of them was positive with and the other one was negative with . For these values, we had and . We considered three examples for the following situations: , and :
- Example 3 (a): and :
and .
- Example 3 (b): and :
and .
- Example 3 (c): and :
and .
In
Table 3, we compare the values of the statistics mean, variance, skewness and kurtosis (excess) for the theoretical values of the product distribution (first line), Monte Carlo Simulation (second line) and Rohatgi’s Theorem approximation (third line). The same results as those in Scenarios 1 and 2 are observed.
3.2. Scenario 4 and Scenario 5
In Scenario 4, we had a normally distributed variable with a zero value and range
with
and a positive variable with range
with
. We considered Example 4 with
and
(see
Figure 2). The graphical result shows a strange result for the estimation of the product for the R function when the value of the product
Z is very near zero. In this situation, the estimated value tends to
as a consequence of the instability of the very small values of the variables near zero. The same situation is observed for Scenario 5 (see
Figure 3).
In this example, there were greater differences than in the previous examples (see
Table 4), although except in the case of kurtosis, they do not appear to be significant.
In Scenario 5, we had a normally distributed variable with a zero value and range
with
a and a negative variable with range
and
. We considered Example 5 with
and
(see
Figure 3). As in Scenario 4, there were greater differences than in the examples in Scenarios 1, 2 and 3 (see
Table 5), although they do not appear to be significant.
3.3. Scenario 6: and with and
When the two normally distributed variables are zero-valued, the application of Rohatgi’s theorem is more difficult because it is not defined for zero. In this case, we considered two subintervals for the two normal variables, and , when the original range is with and . We used . We had nine situations:
- Example 6 (a): and :
and .
- Example 6 (b): and :
and .
- Example 6 (c): and :
and .
- Example 6 (d): and :
and .
- Example 6 (e): and :
and .
- Example 6 (f): and :
and .
- Example 6 (g): and :
and .
- Example 6 (h): and :
and .
- Example 6 (i): and :
and .
In
Table 6 and
Table 7, we compare the values of the statistics. We can observe some significant differences between the theoretical values and the PDF function values. The differences are greater for skewness and kurtosis because of the strange results of the PDF function estimation when the value of the variable is near zero.
The examples in which the variables take null values show much worse results than in the cases analysed above. The greatest divergences occur in the value of kurtosis. When the value of the variable approaches zero, it presents density values very different from those obtained through the Simulation processes and the theoretical values.
4. Discussion
This work has focused on the estimation of the probability density function (PDF) of the product of two normal functions without correlation. Using Rohatgi’s theorem, we can calculate the value of this function using an integral approximation. Unfortunately, the integral does not have an analytical expression, so it is necessary to use numerical integration to approximate the values of the integral at specific intervals. We have developed a function in R to estimate the PDF of the product of these two normal variables and obtain a vector of values.
As an example, we used a graphical representation of the product density function obtained with the values reported by the R function and we compared it with the probability density function of a Monte Carlo Simulation of the product of two variables. When we had two normally distributed variables without a zero value in their range, the two approximations showed a very high degree of agreement.
On the other hand, we also compared the values of the following distribution statistics: mean, variance, skewness and kurtosis. In this case, we used the following three values:
- (i)
The values of the theoretical distribution of the product. These values were obtained through the moment-generating function.
- (ii)
The values obtained in the Simulation process.
- (iii)
The values of the estimation of the distribution through the R function.
In general, the results were very close in all the cases analysed. We only observed some small significant differences in those cases where the normal variables incorporated the null value in their range of variation.
In summary, the new R function presents an alternative solution for estimating the PDF of the product of two uncorrelated normal variables. Its implementation incorporates advanced statistical techniques and leverages the capabilities of the R programming language, making it an invaluable tool for researchers and analysts seeking precise and reliable results in complex statistical analyses.
Future work should focus on analysing the observed differences and try to approximate the estimated value of the density function for those situations where the normally distributed variable reaches zero. On the other hand, a more ambitious goal would be to generalise Rohatgi’s theorem to the case of variables with a non-zero correlation. Moreover, another future challenge is to explore similar procedures for calculating the PDFs of some other non-Gaussian distributions.