1. Introduction
A Bayesian Network (BN) [
1] is a probabilistic graphical model that represents a joint distribution on a set of random variables in a compact form that exploits conditional independence relationships among the random variables. The random variables (RVs) are represented as nodes in a directed acyclic graph (DAG) in which a directed edge represents a direct dependency between two nodes and no directed cycles are allowed in the graph. Bayesian Networks have become a powerful tool for representing uncertain knowledge and performing inference under uncertainty. They have been applied in many domains, such as image understanding, data fusion, medical diagnosis and fraud detection, and have become a powerful tool in inference for the real world.
Hybrid Bayesian Networks (HBNs) can contain both discrete and continuous RVs. An important subclass, the conditional linear Gaussian (CLG) networks, consists of networks in which all discrete random variables have only discrete parents, all continuous random variables have Gaussian distributions and the conditional distribution of any Gaussian RV is linear in its Gaussian parents. Exact inference methods exist for CLG networks [
2,
3]. However, even in special cases for which exact inference in discrete Bayesian Networks (BNs) is tractable, exact inference in CLG networks can be NP-hard [
4]. In particular, the posterior marginal distribution for each individual Gaussian random variable is a mixture of Gaussian distributions and the number of components needed to compute the exact distribution for a given random variable may be exponential in the number of discrete variables in the network. Furthermore, no exact algorithms exist for general conditional Gaussian (CG) networks. Therefore, approximate inference for CG networks is an important area of research.
Approximate inference algorithms for HBNs can be roughly classified into six categories: (1) Sampling (SP), (2) Discretization (DS), (3) Structure Approximation (SA), (4) Variational Inference (VI), (5) Clustering (CL) and (6) Message Passing (MP) approaches.
SP algorithms [
5,
6,
7,
8] draw random samples to use for inference and can handle BNs of arbitrary structure. Henrion [
9] presented a basic sampling approach, called logic sampling, for approximate inference in discrete Bayesian networks. Logic sampling generates samples beginning at root nodes and following links to descendant nodes, terminating at leaf nodes of the graph. If a sampled realization contains an evidence node whose value does not match the observed value, it is rejected. The result is a sample from the conditional distribution of the sampled nodes given the observed values of the evidence nodes. This rejection strategy may require a very large number of samples to converge to an acceptable inference result. Further, this strategy cannot be applied when there are continuous evidence nodes, because in this case all samples would be rejected. Fung & Chang [
10] suggested a method that sets all evidence variables to their observed values, samples only the non-evidence variables and weights each sample by the likelihood of the evidence variables given the sampled non-evidence variables. This likelihood weighting algorithm, which can be applied when evidence nodes are continuous, has become very popular. However, when the evidence configuration is highly unlikely, this method can result in very poor accuracy. Pearl [
1] proposed a Gibbs sampling approach for Bayesian networks. His algorithm is a specific case of the more general class of Markov Chain Monte Carlo algorithms [
11]. Efficiency of Gibbs sampling can be dramatically improved by sampling only a subset (called a
cutset) of random variables that breaks all loops in the graph and performing exact inference on the remaining singly connected network [
12]. Nevertheless, for any SP algorithm very large numbers of samples may be required for challenging BNs, such as those with complex topologies, very unlikely evidence configurations and/or deterministic or near-deterministic relationships.
DS algorithms [
13] change a hybrid BN to a discrete BN by discretizing all continuous RVs in the hybrid BN. This approach changes a continuous variable to a set of intervals, called a bin. After the change, the discretized BN is handled by a discrete inference algorithm (e.g., Reference [
1]). Kozlov & Koller [
13] provided an improved discretization by efficiently adjusting the shape of a continuous RV. However, DS algorithms start with approximation for discretization and this approximation can cause inaccurate posterior distributions. Accuracy can be improved with finer discretization but at the cost of possibly major additional cost in time and space. Furthermore, there is a time cost for discretization and a need for methods to choose the granularity of the distribution to balance accuracy against computation cost.
SA algorithms change an intractable hybrid BN (e.g., conditional nonlinear Gaussian network) to a tractable hybrid BN (e.g., conditional linear Gaussian network). After changing to a tractable hybrid BN, a hybrid inference algorithm which can handle the tractable hybrid BN is used for inference. Shenoy [
14] proposed a SA algorithm in which any type of a continuous RV can be approximated by a mixture of Gaussian distributions, thus converting an arbitrary hybrid BN to a CG BN. He showed how various hybrid BNs (e.g., non-Gaussian HBN, nonlinear HBN, a HBN with a continuous parent and a discrete child and a HBN with non-constant variance) can be converted to a CG BN. Although SA algorithms can treat various types of HBNs, they require an appropriate CG inference algorithm for the converted HBN.
VI algorithms [
15,
16,
17,
18,
19,
20] treat inference as an optimization problem to minimize differences between variational posterior distributions
Q and true posterior distributions
P. Usually, Kullback–Leibler divergence (KL-divergence) is used to measure the differences for these two distributions, denoted by
. Recent research results in VI algorithms do not show better inference accuracy than SP algorithms, while VI algorithms yield faster results than SP algorithms [
21]. This property is useful when exploring various models that reflect data. A method of combining VA and SP algorithms can be promising. Salimans et al. [
22] have introduced algorithms integrating VA and SP approaches. For big data analytics, VI algorithms on distributed computing were researched in Reference [
23,
24].
CL algorithms handle loops by converting the original BN to a graph of clusters in which each node corresponds to a cluster of nodes from the original BN, such that the graph of clusters contains no loops. A conversion step is required to form clusters from the original BN. Among CL approaches, the popular Junction Tree (JT) algorithm has been adapted for inference in CG networks [
2,
3]. However, constraints required by the Lauritzen algorithm [
2,
3] on the form of the junction tree tend to result in cliques containing very many discrete nodes. Because inference is exponential in the number of discrete nodes in a cluster, the algorithm is often intractable even when a tractable clustering approach exists for a discrete network of the same structure [
4]. For this reason, it is typically necessary to resort to approximate inference. Gaussian mixture reduction (GMR) has been suggested as an approximation approach [
25]. GMR approximates a Gaussian mixture model (GMM) with a GMM having fewer components.
In MP algorithms, each node in the BN sends messages to relevant nodes along paths between the relevant nodes. The messages contain information to update the distributions of the relevant nodes. After updating, each of the nodes computes its marginal distribution. If the BN has loops, message passing may not converge. MP algorithms are also subject to the problem of uncontrolled growth in the number of mixture components. A GMR approach has been proposed to address this issue [
26,
27]. However, they provided no general algorithm for applying GMR within the MP algorithm. Park et al. [
28] introduced a general algorithm for MP using GMR but included no guidance on how to trade off between accuracy and computational resources in hybrid MP using GMR.
This paper presents a complete solution to the hybrid inference problem by providing two algorithms: Hybrid Message Passing (HMP) with Gaussian Mixture Reduction (GMR) and Optimal Gaussian Mixture Reduction (Optimal GMR).
The HMP-GMR algorithm prevents exponential growth of Gaussian mixture components in MP algorithms for inference in CG Bayesian networks. We present an extension of the algorithm of [
26,
27] that incorporates GMR to control complexity and examine its performance relative to competing algorithms.
Each inference algorithm has its own characteristics. For example, some algorithms are faster and some are more accurate. Further, accuracy and speed can depend on the Bayesian network and the specific pattern of evidence. These characteristics can be used as guidance for choosing an inference method for a given problem. Metrics for evaluating an inference algorithm include speed, accuracy and resource usage (e.g., memory or CPU usage). In some situations, algorithm speed is the most important factor. In other cases, accuracy may be more important. For example, early stage missile tracking may require a high speed algorithm for estimating the missile trajectory, while matching faces in a security video against a no-fly database may prioritize accuracy over speed. The HMP-GMR algorithm requires a maximum number of Gaussian components as an input parameter. This maximum number of components influences both accuracy and execution time of the HMP-GMR algorithm. We introduce a pre-processing algorithm called HMP-GMR with Optimal Settings (HMP-GMR-OS), which optimizes the initial settings for HMP-GMR to provide the best accuracy on a given HBN under a bound on computation time. The HMP-GMR-OS algorithm is intended for cases in which a given HBN will be used repeatedly in a time-limited situation and a pre-processing step is desired to balance accuracy against speed of inference. Sampling approaches have been used for such situations, because of their anytime property. That is, sampling always provides an answer even if it runs out of time. In some cases, our algorithm can result in better accuracy than a sampling approach for the same execution time.
The layout of this paper is as follows.
Section 2 introduces Hybrid Message Passing Inference and Gaussian Mixture Reduction.
Section 3 presents the HMP-GMR algorithm, which combines the two methods introduced in
Section 2.
Section 4 proposes the HMP-GMR-OS algorithm to find the optimal number of allowable components in any given mixture distribution.
Section 5 presents experimental results on the advantages and disadvantages of the new algorithm.
Section 6 draws conclusions.
3. Extended Hybrid Message Passing Algorithm
The previous sections introduced message passing inference and Gaussian mixture reduction. This section combines these methods into an extended hybrid message passing algorithm for CG BNs.
The GMR operation is denoted as a function, that applies Equation (16), where is a Gaussian mixture model and is the maximum number of allowable mixture components.
To specify the algorithm, we need to define where in the inference process, GMR will be applied. For this, the Pi/Lambda functions for
X and Pi/Lambda messages
in Type 14 and
in Type 17 are chosen. Hence, the function
is applied to Equations (7)–(11), (14) and (15). For the extended algorithm, these equations become:
and
where
M =
denotes the maximum allowable number of components.
The above equations are implemented in Algorithm 1, called
Hybrid Message Passing Algorithm with Gaussian Mixture Reduction (HMP-GMR). Algorithm 1 is an extension of a Hybrid Message Passing algorithm (HMP) from Reference [
29] to which we apply GMR. An initial version of this algorithm was introduced in Reference [
28]. In contrast with the initial version, this is an anytime algorithm that can provide a solution even if it is interrupted before completion.
HMP-GMR (Algorithm 1) has five inputs. The first input, net, is the Hybrid BN with specified evidence nodes and their values. The second input, , is the maximum execution time used to control how long this algorithm runs by comparing to the current execution time, , representing a period from the algorithm starting time to the current time. The third input, , is the maximum number of iterations allowed, where an iteration is one round in which all nodes in the BN perform their operations. The fourth input, , is the maximum number of Gaussian components that may be output by the GMR function . The fifth input, , is a threshold on the distance between posterior distributions of nodes in the current and previous iterations. The algorithm terminates when the distance is lower than the threshold. HMP-GMR outputs approximate posterior distributions of all nodes. Given these inputs, Algorithm 1 proceeds as follows.
- Line 2
The algorithm iterates message passing from 1 to the maximum number of iterations or until it is interrupted due to exceeding the time limit.
- Line 3
The algorithm cycles through all nodes in the BN.
- Line 4
For the j-th node, all Pi messages from its parents are computed to calculate the Pi value . If the RV is discrete, Equation (3) is used, while if it is continuous and has only discrete, only continuous or hybrid parents, Equation (11), (17) or (18), is used, respectively.
- Line 5
All Lambda messages from children of the j-th node are computed to calculate the Lambda value . If the RV is discrete, Equation (4) is used, while if it is continuous Equation (19) is used.
- Line 6
A Pi message is sent from the j-th node to its children. If the node is discrete, Equation (5) is used, while if it is continuous Equation (20) is used.
- Line 7
A Lambda message is sent from the j-th node to its parents. If the node is discrete, Equation (6) is used, while if it is continuous and has only discrete, only continuous or hybrid parents, Equation (15), (21) or (22) (for continuous parent)/(23) (for discrete parent) is used, respectively. For each of these functions in Lines 4, 5, 6 and 7, if the current execution time exceeds the maximum execution time, the result from the function is not updated and the for-loop in Line 3 of the HMP-GMR procedure is stopped.
- Line 8
After all nodes have passed their messages (i.e., Line 3 ∼ 7), the belief function is computed for all nodes.
- Line 9
The Lambda and Pi values are multiplied and normalized for all nodes to calculate the belief function .
- Line 10
The difference between the current and previous beliefs are computed for all nodes.
- Line 11
The maximum difference max_diff between current and previous belief is selected.
- Line 12
If the maximum difference max_diff is less than the maximum precision max_prcs, the iteration of the message passing is stopped.
- Line 13
Upon stopping, the algorithm outputs approximate posterior marginal distributions for all nodes.
There are three exit points from the iteration: (1) when the iteration reaches the maximum number of allowable iterations, (2) when the maximum difference is less than the maximum precision and (3) when the current execution time for the algorithm exceeds the maximum execution time.
Algorithm 1: Hybrid Message Passing (HMP) with Gaussian Mixture Reduction (GMR) Algorithm. |
Input: a Hybrid BN net, |
a maximum execution time max_time, |
a maximum number of iterations max_iteration, |
a maximum allowable number of components max_nc, |
a maximum precision max_prcs |
Output: a set of belief functions belij |
|
4. Optimizing the Settings of HMP-GMR
In some situations, the Hybrid Message Passing with Gaussian Mixture Reduction (HMP-GMR) algorithm performs better than other algorithms. For example, although in theory a sampling algorithm can be made as accurate as desired, for a given problem, HMP-GMR may have higher accuracy for a given limit on computation time. However, HMP-GMR requires initial settings before it executes. The performance of HMP-GMR depends on these initial settings. More specifically, HMP-GMR requires that the maximum allowable number of components max_nc and the maximum number of allowable iterations max_iteration are specified as inputs. If the maximum allowable number of components is too small, accuracy may be too low; but if it is too large, execution time may be unacceptably long. Also, the maximum number of allowable iterations can influence accuracy and execution time. The number of components required to achieve a given accuracy depends on the network topology, the placement of continuous and discrete nodes and the conditional distributions. As noted above, when the BN contains loops, the HMP-GMR algorithm may not converge. Thus, in some problems, HMP-GMR may spend many iterations without a significant improvement in accuracy.
Therefore, there is a need to trade off accuracy against execution time depending on the maximum allowable number of components and the maximum number of iterations. Different applications pose different requirements on execution time. It is assumed that the maximum allowable execution time for inference is an input parameter that is specified before the inference algorithm runs. Therefore, the optimization problem is defined as attaining the best achievable accuracy for a given constraint on execution time, by varying the maximum allowable number of components and the maximum allowable number of iterations.
Finding an exact optimum would be infeasible in the general case. Therefore, this section presents a Monte Carlo method to find approximately optimal values for a specific conditional Gaussian Bayesian network. The algorithm is appropriate for problems in which a given HBN is specified a priori and inference on the HBN will be performed repeatedly in a time-restricted setting with limits on execution time for inference. The optimization can be performed offline as a pre-processing step to find good initial settings for performing HMP-GMR inference at run time. For example, a real-time threat detection system might use a CG HBN to process sensor inputs automatically and issue alarms when suspicious combinations of sensor readings occur. Because the system runs in real time, fast execution is essential. At design time, an offline optimization can be run using the algorithm presented here to determine the best settings for the maximum number of components and the maximum number of iterations.
An optimization problem for this situation can be formulated as shown in Equation (24). In this setting, we assume that a specific HBN is given.
where
means a set of the candidate maximum allowable numbers of components {
,
, …,
},
means a set of the candidate maximum iteration numbers {
,
, …,
},
f(.) is an objective function measuring error of HMP-GMR,
t means the current execution time for inference of HMP-GMR and
max_time means the maximum execution time. We call this as HMP-GMR with Optimal Settings (HMP-GMR-OS), which finds the values (
and
) that achieve the best accuracy under a given time restriction. Equation (25) shows the objective function f(.) of HMP-GMR-OS.
where
E means a set of the candidate evidences {
,
, …,
}, which are randomly selected, for a Bayesian network
net,
err(.) means a function resulting in an error between a near-correct inference result from sampling and an inference result from HMP-GMR,
s(.) means a sampling inference algorithm used for exact inference,
h(.) means the HMP-GMR algorithm with a maximum execution time
, a candidate maximum allowable number of components
and a candidate maximum iteration number
.
The Monte Carlo method for HMP-GMR-OS is called an HMP-GMR-OS algorithm (Algorithm 2), which finds the best values for the two decision variables given a Hybrid BN, a maximum execution time, a number of samples, an upper limit on the maximum number of iterations and an upper limit on the maximum allowable number of components.
Algorithm 2: HMP-GMR with Optimal Settings (HMP-GMR-OS) Algorithm. |
Input: a Hybrid BN net, |
a maximum execution time max_time, |
a number of samples num_samples, |
an upper limit on the maximum number of iterations ul_max_it, |
an upper limit on the maximum allowable number of components ul_max_nc |
Output: An optimal number of components nc and an optimal number of iterations it |
|
The HMP-GMR-OS algorithm has five inputs. The first input net is a Hybrid BN. The second input max_time is the maximum execution time for inference of HMP-GMR. The third input num_samples is the number indicating how many times the simulation should be repeated. The fourth input ul_max_it is the number of maximum iterations used for inference of HMP-GMR. The fifth input ul_max_nc indicates the number of maximum allowable components which will be investigated. Given these inputs, the algorithm proceeds as follows:
- Line 2
The algorithm simulates the given number of samples.
- Line 3
The algorithm randomly selects some evidence nodes from the Hybrid BN net. Also, it randomly selects a reasonable evidence value for each evidence node (i.e., a highly unlikely value is not used for the evidence value) and provides the i-th set of evidence values .
- Line 4
The set of the evidence values are used for inference of a sampling algorithm by which nearly correct results of inference (i.e., posterior distributions) are found.
- Line 5
The maximum allowable number of components, denoted by j, is varied from 1 to the upper limit on the maximum allowable number of components ul_max_nc.
- Line 6
The maximum number of iterations, denoted by k, is varied from 1 to the upper limit on the maximum number of iterations ul_max_it.
- Line 7
This algorithm uses the HMP-GMR algorithm with the Hybrid BN net, the set of the evidence values , the maximum execution time max_time, the maximum allowable number of components j and the maximum number of iterations k. Then, the HMP-GMR algorithm provides the results (i.e., posterior distributions).
- Line 8
An inference error value
between the nearly correct results
and the HMP-GMR’s result
is computed by using a distance function (e.g., KL-divergence [
38]).
- Line 9
The inference error value at i-th sample for j and k is stored at a set of inference error values rjk for j and k.
- Line 10, 11, 12
After simulating all samples, for all j and k, an average inference error avg_rjk is calculated using the set of the inference error values rjk.
- Line 13
A best maximum allowable number of components and a best maximum number of iterations are selected by finding a minimum average inference error from avg_rjk.
- Line 14
The algorithm outputs the best values and .
In summary, the HMP-GMR-OS algorithm is a pre-processing algorithm finding the optimal settings for HMP-GMR given a HBN, to improve accuracy before HMP-GMR for the HBN executes for a practical situation.
5. Experiment
This section presents experiments to evaluate the performance of the HMP-GMR algorithm and the Optimal GMR algorithm. For evaluation of the HMP-GMR algorithm, Park et al. [
28] presented simple experiments to demonstrate scalability and efficiency using two hybrid BNs. Here, more extensive experiments are performed on four BNs. These four BNs would be representative BNs in terms of various numbers of discrete parent nodes and various numbers of loopy structures in a given network.
Figure 2 shows two illustrative Conditional Gaussian (CG) BNs (i.e., 1 and 2) containing a discrete node
A with 4 states, a continuous node
and another continuous node
. These two BNs differ in the links between
and
. The first, shown in
Figure 2a, has no undirected cycles involving only continuous nodes, while the second, shown in
Figure 2b, has undirected cycles among. For example, between
and
, there are two paths:
and
.
Figure 3 shows two additional cases (
Figure 3a,b) in which the BNs contain a large number of discrete nodes. Each discrete node
has four states and the continuous nodes
and
are conditional Gaussians. Note that
Figure 3a has no undirected cycles involving only continuous nodes, while
Figure 3b has loopy structure for the continuous nodes.
The experiments examined both CLG and CNG BNs with these four CG BNs. The size of the four CG BNs was varied by adjusting n. Some of the leaf continuous nodes {, … , } were randomly selected as evidence nodes. All other nodes were unobserved.
Table 2 shows characteristics of these four CG BNs. In these BNs, the discrete nodes have four states. Therefore, in CG BNs 1 and 2, there are four discrete states for discrete node
A, while in CG BNs 3 and 4, there are
total configurations of the discrete states. This is
= 16384 configurations when
n = 7. When
n = 7, CG BNs 1 and 4 contain 21 cycles, while CG BN 2 contains 501 cycles (Cycles were derived by a cycle finding algorithm [
41]). CG BN 3 contains no cycles.
The following factors were varied in the experiment: (1) Type of hybrid BN (i.e., BN 1, 2, 3 or 4; CLG or CNG BN), (2) type of inference algorithm (i.e., Hybrid Junction Tree (Hybrid-JT) [
2,
3], original Hybrid MP (HMP) [
29], Hybrid MP with Gaussian Mixture Reduction (HMP-GMR) [
28] or Likelihood Weighting (LW) sampling [
10]), (3) number of repeated structures
n, (4) algorithm characteristics (i.e., the number of GMM components allowed, the number of allowable message passing iterations and the maximum precision for the message passing algorithms). The dependent variables were accuracy of result and execution time. For all experiments, the convergence criterion for HMP and HMP-GMR was
=
.
Using these settings, we conducted three experiments: (1) A comparison between HMP, HMP-GMR and Hybrid-JT investigated scalability of HMP-GMR to complex networks (i.e., larger values of
n), (2) the HMP-GMR itself was evaluated for posterior distribution accuracy and execution time and (3) optimal settings derived from the HMP-GMR-OS algorithm were evaluated by inference accuracy. The experiments were run on a 3.40 GHz Intel Core i7-3770 processor. The algorithms were implemented in the Java programming language (The source codes are available online at “
https://sourceforge.net/p/prognos/code/HEAD/tree/trunk/cps2/”). In a Java code for HMP-GMR, there was another exit point from the iteration of the HMP-GMR algorithm in
Section 3. In some cases, a computation between Lambda values and Pi values in the HMP-GMR algorithm could not be computed due to numeric underflow. This happened when the HMP-GMR algorithm diverged. When this occurred, the HMP-GMR algorithm stopped and provided its current solution.
5.1. Scalability of HMP-GMR
The first experiment examined improvement in scalability of HMP-GMR over HMP and Hybrid-JT.
The initial setting of this experiment consists of (1) maximum of 4 components output by GMR and (2) 100 iteration limit for each of HMP and HMP-GMR. Eight CG BNs (i.e., conditional linear/nonlinear cases for CG BNs 1, 2, 3 and 4) were run with HMP, HMP-GMR and Hybrid-JT using the following inputs and outputs. The input value of n for both BNs was varied from 1 to 10. The output value is the execution time.
Figure 4 and
Figure 5 show the results of this experiment summarizing the execution times (milliseconds) for the CLG case as the number of nodes
n is varied. The solid line denotes the HMP-GMR results. The dashed line denotes the HMP results. The dotted line denotes the Hybrid-JT results.
Results for CG BNs 1 (
Figure 4a) and 2 (
Figure 4b) show a similar pattern. The HMP algorithm with no GMR exceeded the time limit at
n = 7 and
n = 4, respectively. Execution times for HMP-GMR were higher than those for Hybrid-JT in both cases. The increase in execution time for both HMP-GMR and Hybrid-JT was linear in
n.
In
Figure 5, results from HMP and Hybrid-JT show exponential growth in execution time, while execution time of HMP-GMR increased linearly. For HMP, the execution time limit for CG BNs 3 (
Figure 5a) and 4 (
Figure 5b) was exceeded at
n = 7 and
n = 4, respectively. For Hybrid-JT, the execution time limit for CG BNs 3 and 4 was exceeded at
n = 9.
Results for the CNG networks showed similar patterns and are not shown here for brevity. These experiments showed that HMP-GMR is scalable to large BNs for both linear and nonlinear CG networks. However, scalability alone is not sufficient. Accuracy and good operational performance are also essential.
5.2. Accuracy and Efficiency of HMP-GMR
In this experiment, we investigated the accuracy and convergence of HMP-GMR for the four CLG BNs. To evaluate accuracy of HMP-GMR, exact inference results using Hybrid-JT inference were used. Some of the runs using Hybrid-JT stopped because of the exponential growth of components, so Hybrid-JT produced posterior distributions only for
n≤ 7. For this reason, this experiment used
n = 7 for the four CLG BNs. Accuracy was measured by KL-divergence [
38] (lower values mean better accuracy). We calculated the KL-divergence between exact and approximate results for each unobserved node and summed them (henceforth, we use KL-divergences to mean the sum of KL-divergences over unobserved nodes). The number of runs in the experiment was 100. The maximum allowable number of components was
= 2. The maximum number of iterations was
= 10,000. The maximum execution time was
max_time = 200,000 millisecond (ms). In the experiment, there were three exit points: (1) When the algorithm converged, (2) when the time limit was exceeded and (3) when the algorithm diverged. When the algorithm did not converge, the algorithm halted and provided its current solution.
Figure 6 shows percentages for each CLG BN for which the algorithm converged, diverged and ran out of time. For CLG BN 1, the algorithm converged in 97% of the runs and ran out of time in 3% of the cases (execution time > 200,000 ms). For CLG BN 2, the algorithm converged in 52% of the runs, ran out of time in 3% of the runs and diverged in 45% of the runs. For CLG BN 3, the algorithm converged in 100% of the runs. For CLG BN 4, the algorithm converged in only 31% of the runs and ran out of time in 69% of the runs.
We observed that the large number of cycles (CLG BN 2) could cause many situations in which the algorithm did not converge (i.e., diverged or ran out of time). Also, the algorithm for the cases with no cycles (CLG BN 3) always converged. For the cases in which the algorithm ran out of time, if more time had been allowed it might have converged, diverged or failed to either converge or diverge (i.e., oscillated). In some cases for CLG BNs 1, 2 and 4, the algorithm oscillated until reaching the maximum execution time and halting. When there were many cycles and many discrete states (i.e., for CLG BN 4), the algorithm often ran out of time.
Table 3 shows averages (avg.) for KL-divergence over runs and average execution times for three cases (converged, diverged and ran out of time) on the four CLG BNs (numbers in parentheses are standard deviations).
Figure 7 shows the accuracy results from this experiment, when the algorithm converged. In
Figure 7, the four lanes denote the four CLG BNs. For case of convergence, the averages for KL-divergence over runs of the experiment were 0.0001, 1.04, 2.25 and 3.64 for CLG BNs 1, 2, 3 and 4, respectively. Again, for the case of convergence, the average execution times were 1514, 16,547, 2164 and 9584 for CLG BNs 1, 2, 3 and 4, respectively. For the case of convergence, because of the large number (501) of cycles in CLG BN 2, the inference algorithm required more time (avg. 16,547 ms) to converge than others. Also, the algorithm for CLG BN 4 spent more time (avg. 9584 ms) in comparison with the case (avg. 2164 ms) of CLG BN 3, because of the large number (21) of cycles in CLG BN 4. In the case of divergence, the algorithm for CLG BN 2 halted with numeric underflow in the Lambda or Pi value computation. In this case, the algorithm performed with very poor accuracy (avg. 106.35) and had a long execution time (avg. 9678 ms).
Some cases for CLG BNs 1, 2 and 4 stopped because the maximum execution time was exceeded. This never happened in CLG BN 3, which contained no cycles. In addition, accuracies for CLG BNs 1 and 2 were better than those for CLG BN 4.
The four sets of results depicted in
Table 3 illustrated how network topology influences accuracy and execution time. In this experiment, we used arbitrary settings (i.e.,
= 2 and
= 10,000) for HMP-GMR. In the next section, we investigate whether the performance of HMP-GMR can be improved by optimizing these settings.
5.3. Optimal Settings for HMP-GMR
HMP-GMR requires initial settings (i.e.,
and
), which influence accuracy and execution time. To find good initial settings, the HMP-GMR-OS algorithm was introduced in
Section 4. With this algorithm, we use the following experiment setting: (1) the Hybrid BNs (i.e., CLG BNs 1, 2, 3 and 4 with
n = 7), (2) the maximum execution time (i.e.,
max_time = 3000 ms), (3) the number of samples (i.e.,
num_samples = 50), (4) the upper limit on the maximum allowable number of components (i.e.,
ul_max_nc = 10), (5) the upper limit on the maximum number of iterations (i.e.,
ul_max_it = 10) and (6) the Hybrid-JT algorithm to obtain correct inference results.
Figure 8 shows the results from this experiment obtained by the HMP-GMR-OS algorithm. Results for CLG BNs 1, 2, 3 and 4 are shown in
Figure 8a–d, respectively.
Table 4 shows minimum averages for KL-divergences in
Figure 8 and best values for
and
(numbers in parentheses are standard deviations). For example, for CLG BN 1, the minimum average for KL-divergence was 0.78 and its standard deviation was 3.37 at
= 5 and
= 10. For CLG BN 4, the minimum average for KL-divergence was 11.37 and its standard deviation was 8.44 at
= 1 and
= 8.
For CLG BNs 1 and 3, the optimal number of iterations was the upper limit of 10. A better value might have been found if a larger number of iterations had been investigated. For CLG BNs 2 and 4, the best value for the maximum number of iterations was smaller than 10. Note that better results might be obtained, if the number of samples was increased and/or the ranges of and/or were expanded.
From this experiment, we observed that good accuracy can be achieved with a small number of components. Although the best values found by our experiment for
were 5, 4, 6 and 1 for CLG BNs 1, 2, 3 and 4, respectively, the accuracy was not much better than using a single component. For example, the average of KL-divergence for CLG BN 1 was 0.78 at
= 5 and
= 10, while the average of KL-divergence for CLG BN 1 was 1.09 at
= 1 and
= 10. To check whether the difference was statistically significant, paired t-tests were performed at the 5% significance level.
Table 5 shows confidence intervals from the paired t-tests. From these tests, we observed that the difference between the setting found by HMP-GMR-OS and the case with
= 1 and
= 10 was not statistically significant for any of the CLG BNs.
This result suggests choosing
= 1 as a default setting for HMP-GMR. This default setting for HMP-GMR was evaluated for accuracy against the Likelihood Weighting (LW) algorithm.
Figure 9 shows accuracy comparison between HMP-GMR with the default setting
= 1 and LW sampling for the four CLG BNs. We use the following experiment setting: (1) type of the hybrid BNs (i.e., CLG BNs 1, 2, 3 and 4 with
n = 7), (2) type of inference algorithm (i.e., HMP-GMR at
= 1 and
= 10 and LW sampling), (3) 100 samples, (4) the maximum execution time (i.e.,
max_time = 3000 ms) and (5) the Hybrid-JT algorithm to obtain correct inference results.
Figure 9 shows results from this experiment. When HMP-GMR didn’t converge, it stopped at the maximum execution time and provided its current solution. The chart contains eight lanes for four groups (CLG BNs 1, 2, 3 and 4). In the two adjoined lanes for each group, the left lane denotes the HMP-GMR case, while the right lane denotes the LW case. For example, the first lane in
Figure 9 denotes the HMP-GMR case for CLG BN 1 (i.e., HG 1), while the second lane in
Figure 9 denotes the LW case for CLG BN 1 (i.e., LW 1). The execution times for the two cases in each group were set to similar values. That is, the number of samples for LW was controlled to achieve similar execution times as HMP-GMR.
Table 6 shows averages of KL-divergences for the two algorithms (numbers in parentheses are standard deviations). For example, for CLG BN 1, an average KL-divergence from HMP-GMR was 0.57. For CLG BN 4, an average KL-divergence from LW was 14.29. The fourth row denotes a natural-log ratio between HMP-GMR and LW. In the comparison between HMP-GMR and LW, LW was better than HMP-GMR for CLG BN 2.
Figure 10 shows accuracy comparison between LW and HMP-GMR for the four CLG BNs. For CLG BN 1, HMP-GMR provided much better accuracy than LW. For CLG BN 2, LW provided better accuracy than HMP-GMR. For CLG BN 3, HMP-GMR provided better accuracy than LW. For CLG BN 4, LW and HMP-GMR performed similarly but as can be seen in
Figure 9, the results for LW were more variable.
Accuracy from HMP-GMR was lower in comparison with LW, for the CLG BN containing many loops. CLG BN 2 contained 501 cycles, while CLG BNs 1 and 4 contained 21 cycles and CLG BN 3 contained no cycles. More extensive investigations would be needed to determine whether this superiority of LW over HMP-GMR generalizes to arbitrary BNs with many loops. Accuracies from HMP-GMR did not depend much on the number of discrete states. CLG BN 1 contained four discrete states, while the CLG BN 3 contained 16384 configurations of the discrete states. For these networks, HMP-GMR provided better accuracy than LW regardless of how many discrete states a CLG BN contains.
We can consider whether to use HMP-GMR or LW according to the features of the CLG BN. The following list shows suggestions in which HMP-GMR can be chosen or not in terms of the number of configurations of the discrete states and the number of cycles.
Small number of configurations of the discrete states and small number of cycles: In this case, our experiment showed better accuracy from HMP-GMR in comparison with LW under a given time restriction. LW requires many samples to improve accuracy, while HMP-GMR uses message passing approach, which can provide exact results for a polytree network [
1]. In a simple-topology network (i.e., small number of configurations of the discrete states and small number of cycles), when HMP-GMR converges in time, it can provide high accuracy.
Small number of configurations of the discrete states and large number of cycles: When there are many cycles, HMP-GMR can diverge. This behavior depends on the network topology, the placement of continuous and discrete nodes, the conditional distributions and the pattern of evidence. When divergence occurs, HMP-GMR halts during message passing because of numeric underflow. A feature to detect when the pi and lambda messages are going out of bounds and stop the algorithm may be useful but intermediate results from before the algorithm diverges are of doubtful usefulness. In the case of a large number of cycles, LW can perform better than HMP-GMR.
Large number of configurations of the discrete states and small number of cycles: HMP-GMR reduces the maximum allowable number of components, which is influenced by the number of configurations of the discrete states. So, HMP-GMR can tolerate many numbers of configurations of the discrete states, while LW requires many samples for many numbers of configurations of the discrete states. In this case, we observed that for the four CLG BNs, HMP-GMR could converge and provide good accuracy.
Large number of configurations of the discrete states and large number of cycles: Although LW can tolerate many cycles, it can perform poorly when the necessary number of samples to achieve good accuracy is too large for the available time limit. Also, HMP-GMR can perform poorly because of many cycles. In this case, the better choice between LW and HMP-GMR may vary depending on specific features of the problem. For example, when the number of configurations of the discrete states is relatively smaller than the number of cycles, LW can be used. When the number of configurations of the discrete states is relatively larger than the number of cycles, HMP-GMR can be used. However, in any case, accuracies for both approaches may be low.