1. Introduction
Mathematical models [
1,
2] are utilized in many areas as compelling instruments for studying composite processes, but our understanding is generally restricted by uncertainties: inexact data, imperfect understanding, and subjective estimates, etc. Because of that, tools are necessary to estimate how uncertainties [
3] in the input affect certain output model results.
The goal of our work is in the field of environmental security [
4,
5,
6]. To establish itself as a reliable simulation tool, modern mathematical models of remote transport of air pollutants should involve lots of chemical and photochemical reactions [
7]. Monte Carlo algorithms [
8,
9,
10,
11] are the best apparatus for multidimensional air pollution modeling [
6].
By definition, Sensitivity analysis (SA) [
12,
13,
14,
15] is the study of how much the uncertainty in the input data of a model (because of inaccurate evaluations, estimations, and data compression, etc.) is reflected in the accuracy of the output results.
SA is a powerful procedure that verifies and improves the mathematical models [
16,
17]. It specifies the key role of SA when mathematical modeling is frequently the only strategy to investigate a complicated phenomenon. Furthermore, the authentic explanation of SA calculations relies on the effectiveness of methods supplying SA.
The input data for SA has been derived during runs of a large-scale model of long-range transport of pollutants in the air—
Unified
Danish
Eulerian
Model (UNI-DEM) [
18,
19,
20,
21]), which includes a large geographical region (4800 × 4800 km) covering the whole Mediterranean, Europe, and parts of Asia and Africa.
The main physical, chemical and photochemical processes between the investigating species, contaminations, and rapidly changing meteorological conditions are taken into account by UNI-DEM. The reason for selecting it is because, in UNI-DEM, the chemical processes are taken into account in a very precise way. UNI-DEM is described mathematically [
19,
21] by the following system of partial differential equations:
where
—pollutant concentrations,
—wind components along the coordinate axes,
—diffusion coefficients,
—space emissions,
—dry and wet deposit coefficients, (),
—nonlinear functions expressed the reactions between pollutants.
A fundamental role in UNI-DEM is played by the chemical reactions, and they connect the equations in the model in the system precisely. The presence of chemical processes determines the nonlinearity and the ‘stiffness’ property of the system of equations (see [
7]). The model’s chemical scheme is the condensed CBM-IV (Carbon Bond Mechanism) proposed in [
1] and improved in [
19]. UNI-DEM involves 35 pollutants and 116 chemical reactions—69 are time-dependent and 47 are time-independent.
Let us assume that the model can be introduced by a model function [
22]:
The concept of the Sobol approach is given by the following representation of
and a constant
[
22]:
The description (
1) is noted to as the ANOVA-representation of
if [
22]:
It guarantees that the right-hand side functions of (
1) are determined in a distinctive way, where
. The quantities
are called total and partial variances [
22]. An analogical decomposition is fulfilled for the total variance:
Sobol global sensitivity indices [
22,
23] are determined by:
Then the
total
sensitivity
index (TSI) of input parameter
is determined by [
24]:
where
is the
order SI for
.
By definition [
13], if
, then the quantity
is named
the main effect of
; if
, then
are named
two-way interactions (second-order SIs); if
, then
are the
three-way interactions (third-order SIs) and so on. The set of input parameters is categorized with respect to their TSI [
13]: very important if
; important if
; unimportant if
and irrelevant if
.
To introduce the
total sensitivity of the output variance to an input parameter
one can exhibited its TSI:
(see [
2]). Therefore, performance SA using Sobol’s approach is turned into a problem for high-dimensional numerical integration [
25].
2. Algorithms and Methods
Let us examine the following high-dimensional problem:
and .
Let
be two integers. By definition [
26,
27], an elementary
s-interval in base
b as a subset of
of the form
,
for all
. A
-net in base
b is a sequence
of
points of
such that
of
. A
-sequence in base
b is an infinite points sequence
in such a way that for all
, the sequence
is a
-net in base
b.
I. M. Sobol [
26] determines the
-meshes and
sequences, which are
-nets and
-sequences in base 2, introduced in 1988 by H. Niederreiter [
27].
Now let us introduce the first algorithm that we are going to use; the well-known Sobol sequence (SOBOL-SEQ) [
28,
29]. To generate its
j-th component, we will choose a polynomial:
and
are either 0 or 1. The sequence
is defined by
⊕ is the bit-by-bit
exclusive-or operator. For the merits
is fulfilled in that every
is odd. The direction numbers
are determined with
. Then the
j-th component of the
i-th point in a SOBOL-SEQ, is determined with
where
is the
k-th binary digit of
. For generating the SOBOL-SEQ, the algorithm with Gray code implementation [
30] and SIMD-oriented Fast Mersenne Twister (SFMT) [
31] has been used.
We now briefly describe the three algorithms based on modified Sobol sequences. They produced the best results for the problem under consideration up to now [
32]. We use the shortcut
MCA-MSS for the Monte Carlo algorithms based on Modified Sobol Sequences.
In the first algorithm
MCA-MSS-1 [
32], we choose a Sobol point (SP)
and a pseudorandom point (p.p.)
uniformly distributed (u.d.) over the sphere. The sphere has a centrum
and a radius
. If the
i-th SP is
then
with a p.d.f.
is determined in the following way
where
is a unique u.d. vector in
and
is the
shaking radius.
It is assumed that
,
. The unit cube
is divided into
disjoint sub-domains:
with
for all
.
Now in every
s-dimensional sub-domain
exactly one
point
is taken and
. Then MCA-MSS-1 gives the following approximate value of (
2):
For the second algorithm
MCA-MSS-2 [
33], two SP
and
are taken so that
is chosen in the same way as used in MCA-MSS-1. The
is selected to be symmetric to
in correspondence with the central point
in
. Now there are exactly
points. Now, we evaluate all
and
, for
, and MCA-MSS-2 gives the following approximate value of (
2):
For
MCA-MSS-1 and
MCA-MSS-2, the Box–Muller transform is applied for the
s-dimensional vector
[
34]: if
s is even, then
and
is a random point in
. Therefore, a random point
over the
s-dimensional sphere is determined by
.
In the third algorithm
MCA-MSS-2-S [
35], the original domain of integration is divided into
disjoint subdomains with equal volumes
. Now, the first p.p.
is generated u.d. inside the subdomain
. The second point
is computed to be symmetric to
in correspondence with the central point
in
. This is similar to MCA-MSS-2 as the
shaking is with different radiuses
in each subdomain. This algorithm is similar to the stratified symmetrized Monte Carlo [
26], which is why we use the notation MCA-MSS-2-S. The value of (
2) obtained approximately with MCA-MSS-2-S is given by:
The algorithms MCA-MSS-2 and MCA-MSS-2-S have an optimal rate of convergence (
) for the class of continuous functions with continuous first derivatives and bounded second derivatives [
35].
Now to introduce lattice rules, let
, and N is a natural number. By definition [
36], the point set
in
with
for all
is named a lattice point set and
is named its generating vector. The algorithm that uses a lattice point set is named a lattice sequence [
37].
We will use this rank-1 lattice sequence [
38]:
where
is an integer,
is the generating vector and
denotes the fractional part of
z.
There exists an optimal choice of
according to [
39]:
for the function
[
39],
and
do not depend on
N.
When
, the following optimal construction exists [
40].
Consider the following generating vector [
38]:
using
and
(
) is the corresponding number of the generalized Fibonacci series [
38]. Then
Our generating vector (
3) is transformed into [
40]:
We construct a special lattice sequence with the generating vector (
4), namely
LATTICE-FIBO.
Now we will improve the convergence of the lattice sequence using the recently developed component and the Component Fast Construction method (CBC) [
41,
42,
43]. Now we construct a special lattice sequence
LATTICE-CBC with an optimal generating vector following the idea in [
44,
45]. At the first step of LATTICE-CBC,
is generated by the CBC. After that, the points of LATTICE-CBC are obtained by the formula
The approximate value
of (
2) is evaluated in the following way:
Let
be the finite field with
b elements for a prime power
b (when b is a prime,
is the residue class ring modulo
b) [
46]. Using
and a bijection
with
, we generate a digital
-sequence over
. Now we take
matrices
over
. The matrices are of the form
To now generate one of the points
, of the
-sequence, we record
n in its
b-adic (i.e., base
b) expansion
with
and
for all
i large enough. Now we choose the column vector
To generate the point
, for the i-th coordinate
, the value of
is derived by multiplying the i-th matrix
by
in
, which derives a column vector over
:
The elements
, for
, are now the
b-adic digits of
, i.e.,
Now the sequence derived in this way
defines a digital sequence by generating matrices
[
47].
Now we will consider two possible choices for the generating matrices. The first digital sequence is generating matrices based on an implementation of the Sobol sequence from [
48] with 21201 dimensions, and we called it
DIGITAL-SOBOL. The second digital sequence uses the construction of Xing and Niederreiter [
49]. This sequence was implemented by Pirsic [
50] and can be considered as the currently optimal low-discrepancy sequences with a rate of convergence
. We will call this sequence
DIGITAL-XINGN.
3. Calculations and Results
The efficient stochastic algorithms, namely SOBOl-SEQ, MCA-MSS-1, MCA-MSS-2, MCA-MSS-2-S, LATTICE-FIBO, LATTICE-CBC, DIGITAL-SOBOL and DIGITAL-XINGN, have been applied to sensitivity studies with respect to emission levels and some chemical reactions rates of the concentration variations of UNI-DEM pollutants. The estimated quantity is denoted by EQ and its reference value by RV. The relative error is denoted by RE and the approximate evaluation by AE.
In the case of the
sensitivity studies with respect to emission levels, we will study the sensitivity of the model output (in terms of mean monthly concentrations of several important pollutants—in our case, this is ammonia in Milan) with respect to a variation of input emissions from the anthropogenic pollutants, which consists of four different components
as follows:
The output of the model is the mean monthly concentration of the following three pollutants:
In the case of the
sensitivity studies with respect to emission levels, the results for REs for the AE of the
and
, using
SOBOl-SEQ, MCA-MSS-1, MCA-MSS-2, MCA-MSS-2-S, LATTICE-FIBO, LATTICE-CBC, DIGITAL-SOBOL and DIGITAL-XINGN are shown in
Table 1,
Table 2,
Table 3,
Table 4 and
Table 5, respectively. The quantity
is represented by a four-dimensional integral, whereas the rest are represented by eight-dimensional integrals.
In the case of the
sensitivity studies with respect to chemical reactions rates, we will study the ozone concentration in Genova according to the rate variation of these chemical reactions: ##
(time-dependent) and
(time-independent) of the CBM-IV scheme ([
19]) given as follows:
Here, “hv” represents the sunlight in the photochemical reactions and indicates that a photon is proportional to its frequency by Plank’s constant.
In the case of the
sensitivity studies with respect to chemical reactions rates, the results for REs for the AE of the
and
, using
SOBOl-SEQ, MCA-MSS-1, MCA-MSS-2, MCA-MSS-2-S, LATTICE-FIBO, LATTICE-CBC, DIGITAL-SOBOL and DIGITAL-XINGN, are shown in
Table 6,
Table 7,
Table 8,
Table 9 and
Table 10, respectively. As in the first case study, the quantity
is represented by a six-dimensional integral, whereas the rest are represented by twelve-dimensional integrals.
4. Discussion of the Results
In the case of the
sensitivity studies with respect to emission levels, a detailed discussion has been made with the previous best available approaches described in [
51,
52,
53].
For the number of samples
for the model function
, the best algorithm is the DIGITAL-XINGN, followed by MCA-MSS-2-S and MCA-MSS-1—see the results in
Table 1 and
Table 2 for the maximum number of samples. A big improvement over the older best available result is obtained—the new result is 3e-10 vs. 2e-08 [
32,
35]. For a number of samples
for the total variance
D, the best algorithm is the DIGITAL-XINGN, followed by MCA-MSS-2-S and DIGITAL-SOBOL—see the results in
Table 3 and
Table 4 for the maximum number of samples [
32,
35].
Our discussion here includes a detailed comparison with the previous results in the field in [
54,
55]. For the total and first-order sensitivity indices in
Table 5, it can be seen that the two digital sequences improve the results produced by the best available approaches up to now [
32,
33]. The DIGITAL-SOBOL gives the best results for
,
,
,
and
, while the DIGITAL-XING gives the best results for
,
and
. The general conclusion is that our methods significantly improved the accuracy of the sensitivity indices described in [
56]. According to [
33,
54,
56], the best available approaches up to now were the SOBOL-SEQUENCE and MCA-MSS-1 for
,
(these two give exactly the same result for all quantities except
), and MCA-MSS-2 was the best available algorithm for
,
,
,
,
,
. From
Table 5, one can conclude that between the two lattice sequences, LATTICE-CBC is more accurate by at least 1. Furthermore, we can observe that the worst are MCA-MSS-2-S and LATTICE-FIBO, and the last method gives unsatisfactory relative errors. According to [
32,
54,
56], the best improvements for the sensitivity indexes are obtained for
, the new result is 4e-06 vs. 4e-03; for
, the new result is 1e-05 vs. 1e-02. For all total sensitivity indices, there are serious improvements over the existing best available results. For
, the new result is 3e-07 vs. 1e-05; for
, 1e-05 vs. 1e-03; for
, 2e-06 vs. 4e-05; and for
, 1e-06 vs. 1e-02. In [
24], it is mentioned that having the smallest possible sensitivity indices is the most important aspect of the model. In our case, these are
and
, and the improvement is with three and four orders, respectively, which will be crucial for taking into account the reliability of the results of the model.
The performance of the algorithms can be generalized in such a way: the two digital sequences proposed by the authors yield the smallest relative errors for all sensitivity indices; algorithm MCA-MSS-2 is after that, followed by SOBOL-SEQUENCE, MCA-MSS-1 and LATTICE-CBC. The worst results are produced by the MCA-MSS-2-S and LATTICE-FIBO. Comparing our results with the results obtained in [
32,
35,
54,
56] the main advantages of our methods include the significant improvement in the accuracy and lower computational complexity.
In the case of
sensitivity studies with respect to chemical reaction rates, a precise discussion has been made with the previous best available approaches from [
4,
57,
58].
For the number of samples
for the model function
, the best algorithm is the DIGITAL-XINGN, followed by MCA-MSS-2-S and MCA-MSS-1—see the results in
Table 6 and
Table 7 for the maximum number of samples. There is a huge improvement according to the sensitivity studies in [
32,
35]—the new result is 6e-11 vs. 2e-07. For the number of samples
for the total variance
D, the best algorithm is the DIGITAL-SOBOL, followed by DIGITAL-XINGN, MCA-MSS-2 and SOBOL-SEQUENCE—see the results in
Table 8 and
Table 9 for the maximum number of samples [
4].
The discussion in this case includes a comparison between our obtained results with other studies and other papers already published [
32,
59,
60]. For all sensitivity indices in
Table 10, it can be seen that the two digital sequences improve the results produced by the best available approaches except for the sensitivity index
obtained in [
32,
33]. The DIGITAL-SOBOL gives the best results for
,
,
,
,
,
,
,
,
,
,
,
and
, while the DIGITAL-XING gives the best results for
,
,
,
and
. The general conclusion is that our methods show a significant improvement in the accuracy vs. the other calculations in this field [
61]. According to [
4,
33,
59], the best available approaches up to now were the SOBOL-SEQEUNCE for
,
,
,
,
,
,
,
,
, MCA-MSS-1 was the best for
and
, and MCA-MSS-2 was the most accurate for
,
,
,
,
,
,
. According to [
35,
59,
60], the best improvement for the relative error is that for the sensitivity index
, the new result is 4e-06 vs. 3e-04; for
, the new is 8e-07 vs. 4e-05; and for
, the new result is 2e-05 vs. 2e-03. From
Table 10, focusing on the relative errors, one can conclude that between the two lattice sequences, LATTICE-CBC is more accurate by at least two orders. Furthermore, we can observe that MCA-MSS-2-S is one order better than LATTICE-FIBO and one order worse than the LATTICE-CBC.
We can conclude that the two digital sequences proposed by the authors yield the smallest relative errors for all sensitivity indices except , and DIGITAL-SOBOL has the edge; the algorithm MCA-MSS-2 is closely behind them, followed by SOBOL-SEQUENCE, MCA-MSS-1, LATTICE-CBC and MCA-MSS-2-S. The lattice sequence LATTICE-FIBO gives unsatisfactory relative errors for most of the quantities.
Based on the described comparison with other sensitivity studies and other papers already published in the field [
16,
17,
62,
63], the general conclusion from our discussion is that the main advantages of the proposed methods
LATTICE-FIBO, LATTICE-CBC, DIGITAL-SOBOL and DIGITAL-XINGN for the very important large-scale air pollution model UNI-DEM are the much higher accuracy and lower computational complexity.
5. Conclusions
The present study is in a very important area of environmental safety. High levels of pollution can damage ecosystems and harm plants, animals and humans, and it is of great importance to study the levels of pollution precisely. That is why it is of great importance to recognize whether the pollution levels are below some critical values and, if so, to establish a well-founded control system to retain the pollution levels within these limits. These problems can be accomplished successfully by the proposed highly efficient stochastic approaches for studying the sensitivity of various pollution-related processes.
The computational efficiency in terms of relative error of one of the best available stochastic methods for high-dimensional integration has been performed for the sensitivity of the UNI-DEM model output to find the variation of rate constants of chosen chemical reactions and the variation of selected input emissions of the anthropogenic pollutants.
The two proposed digital sequences based on Sobol and Xing-Niederreiter matrices show an essential improvement over the existing best available results produced by the modified Sobol sequences. The proposed approaches by the authors LATTICE-CBC, DIGITAL-SOBOL and DIGITAL-XNING are established as the new best available stochastic methods for this particular very important large-scale air pollution model UNI-DEM.
The obtained results will help test and improve the mathematical models and enable a well-founded explanation by the relevant specialists. By recognizing the main chemical reactions that influence the performance of the system, physicists and chemists will be able to gather important information to improve the model, which will increase the reliability and sustainability of forecasts. Therefore, through our sensitivity results, the mathematical model will contribute to a more accurate evaluation of losses in agriculture and, most importantly, enable an estimation of the effects of harmful emissions on human health.