1. Introduction
The parameter identification of the stochastic dynamic systems given the indirect heterogeneous noisy observations is one of the most complicated and demanding problems in system analysis. Usually, this type of estimation appears for either the development and validation of the mathematical models with their subsequent analysis and forecasting or various versions of the stochastic control under incomplete information. Engineers often use the results of identification in mechanical and electronic systems, in navigation, biology and medicine, economics and finance as well as speech and image processing.
There exists both the series of classical monographs and the number of comprehensive surveys, reflecting the state-of-the-art in the stochastic identification theory. The hidden Markov models (HMMs) are the prospective ones [
1,
2,
3,
4]. First, they form the mathematical basis for the description of the processes in telecommunications [
5,
6], finance [
7,
8,
9], biology and medicine [
10,
11,
12,
13], tracking and navigation [
14,
15] and speech and image recognition [
16,
17,
18,
19]. Second, the HMM advanced theoretical framework provides a solution to a series of optimal estimation and control problems [
20].
The investigation object of this paper represents a subclass of continuous-time HMMs with the states described by the finite-state Markov jump processes (MJPs) given the diffusion observations. Researchers have used various methods and algorithms to identify parameters in these HMMs [
21,
22,
23,
24,
25,
26]; however, the most effective is the maximum likelihood method as realized by the Expectation-Maximization algorithm (EM algorithm) [
2,
27]. It was successfully used for parameter identification in various stochastic models and particularly for the separation of the Gaussian mixtures [
28]. To identify the HMM parameters, the authors of [
29,
30] introduced the corresponding version of the algorithm and proved the conditions of its convergence to a local maximum of the likelihood function. The essential assumption is the constant intensity of the Wiener observation noises.
The identified parameters include the transition intensity matrix (TIM) of the MJP and the conditional drifts in the observations. Note that the EM-procedure leans on the optimal filtering of the system state and some functionals of their trajectories. The paper aims to develop a new modification of the identification algorithm for the continuous-time HMM parameters with the state-dependent noise intensity in the observations. Such observation noises is also called multiplicative. Traditionally, the estimation problems in such systems are rarely investigated because of the known theoretical difficulties (see [
31,
32] for the details). The formal theoretical solution to the optimal filtering problem of the MJP state given the diffusion observations with multiplicative noises has been published recently [
33,
34] along with a complex of the stable numerical algorithms for their realization. The presented work is the natural evolution of that research.
The paper has the following organization.
Section 2 contains a formal description of the investigated observation system and the statement of its parameter identification problem. The section also includes a comparative analysis of the various problem formulations and the properties of the EM-estimates, calculated for the adjacent HMMs: the discrete-time systems and the continuous-time ones with the homogeneous observation Wiener noises. Additionally, the section presents some reasons in favor of the proposed version of the identification algorithm. It is based on the high precision fixed interval smoothing estimates of the hidden state.
Section 3 contains a numerically effective “two-filter smoothing” algorithm, which expresses the smoothed estimates as a symmetric function of the forward- and backward-time filtering ones. The calculation of the smoothed estimates represents the E-step of the EM algorithm. M-step itself and the whole recursive EM-procedure is introduced in
Section 4. The section contains the conversion of the
-optimal state estimates to the
-optimal ones and the formulae for the estimation of the required functionals of the MJP state trajectory. The section also presents pseudocode sketching the structure of the EM algorithm.
The solution to the optimal MJP state filtering given the diffusion observations with the multiplicative noises is a backbone of the identification algorithm. The theoretical estimator, being a variant of the generalized Bayes rule, contains the integrals by an abstract probability measure. They cannot be calculated analytically; hence,
Section 5 proposes their numerical approximations and the accuracy characteristics.
Section 6 contains a complex of illustrative numerical examples.
Section 6.1 presents the comparison of the filtering and fixed-interval smoothing estimates. The new version of the estimates is also compared with one from [
2], initially proposed for the processing of the discrete-time HMMs.
Section 6.2 contains diversified numerical analysis of the proposed identification estimates.
Section 7 contains our concluding remarks.
2. Problem Formulation
On the probability triplet with filtration
, we consider the partially observable HMM
where
is an unobservable state, which is a finite-state MJP with the state space ( stands for the set of all unit coordinate vectors of the Euclidean space ) with the transition intensity matrix (TIM) and initial distribution ; the process is an -adapted martingale; and
are the diffusion observations
discretized with a (small) time increment
;
is an
-adapted standard Wiener process characterizing the observation noise; and
f is an -dimensional observation matrix, whose columns define the conditional observation drift, whereas the collection of -dimensional matrices represents the conditional observation diffusion coefficient given .
The non-decreasing family of -algebras generated by the discretized observations up to the moment is denoted by , .
We suppose that the assumptions below hold for the system (
1) and (
2).
The HMM identification problem is to find the estimates of , f and given the observations .
Remark 1. Note that condition (4) can be weakened, leaving the MJP X ergodicity only. The identification problem of the system (
1), (
2) parameters has its inner history. It represents an approximation of the parameter estimation problem by the continuous-time diffusion observations (
3). The authors of [
29,
30] investigated the continuous-time system (
1), (
3) with scalar observations and additive noise (i.e., the case
). They developed a procedure of simultaneous MJP state filtering and system parameter identification based on the EM algorithm. The parameter identification requires the estimation of the number of
transitions
and the cumulative occupation time of the value
The definitions given in (
5) and (
6) refer, respectively, to the transitions and occupation time related to interval
.
The authors of the monograph [
2] investigated a discrete-time analog of (
1), (
3). The available observation is a vector-valued process corrupted by some multiplicative noise. The authors modified the EM algorithm and incorporated the fixed-interval smoothed estimates of
,
and the “level sums”
in it. Note, the mentioned estimates of
,
and
are not calculated by the two-filter smoothing algorithm, similar to [
37], due to the nature of the underlying processes.
The paper [
33] contains a solution to the optimal filtering of the MJP states given the vector observations with multiplicative noises. The authors suggested an observation transform that splits the initial ones into a diffusion with unit intensity, collection of counting processes and a set of random vectors observed at some non-random moments. The theoretical estimate represents the solution to the discrete-continuous stochastic differential system (SDS) with diffusion and counting processes on the right-hand side.
The article declares that identifiability condition 4 is sufficient for the precise restoration of the MJP state given the indirect noisy observations. Unfortunately, the solution to the SDS cannot be realized numerically due to the necessity to perform a double limit passage for the required observation transform. To overcome this obstacle, the authors suggested a collection of numerical filtering algorithms by the descretized observations. They investigated the performance of the proposed approximation and show that the algorithms deliver high precision to the MJP state estimates.
It is difficult to find the optimal estimates of the processes , and directly by the discretized observations. Nevertheless, it is possible to construct suboptimal ones based on the high precision two-filter smoothing estimates of the MJP state .
Below in the presentation, we use the following notations:
is a row vector of the appropriate dimensionality formed by units.
I is a unit matrix of the appropriate dimensionality.
is a indicator function of the set .
.
is a random number of the state transitions, occurred on the interval .
is a contraction of the probability measure to the -algebra .
is a random vector composed of the occupation times of X in each state during the interval .
is an -dimensional simplex in the space , is a distribution support of the vector .
is a “probabilistic simplex” formed by the possible values of .
is a conditional distribution of the vector
given
, i.e., for any
the following equality holds:
is a conditional distribution of
given
, i.e., for any
the following equality holds:
is a conditional distribution of
given
, i.e., for any
the following equality holds:
is a conditional distribution of
given
, i.e., for any
the following equality holds:
is an M-dimensional Gaussian probability density function (pdf) with the expectation m and nondegenerate covariance matrix K.
is an -dimensional vector of all HMM parameters.
is a natural filtration generated by the continuous-time process available up to the moment t.
3. Auxiliary Problem: Fixed Interval Smoothing Given Discretized Observations
The proposed HMM parameter identification algorithm uses the iterative calculation of the fixed-interval smoothing estimates
:
. For the effective numerical realization of the smoothing procedure, we suggest the two-filtering algorithm [
37,
38,
39].
First, we present an algorithm for the calculation of the forward-time filtering estimate .
Let us introduce the following positive random numbers and matrices:
The assertion below gives a recursive algorithm to calculate .
Proposition 1. The Conditional Expectation (CE) can be calculated by the formulawith the initial condition The proof of Proposition 1 is quite similar to one of Lemma 3 in [
33].
Second, we introduce the backward-time martingale representation of the MJP along with the backward-time filtering estimate. On the initial probability triplet we consider
The backward-time filtration : .
The backward-time collection of the discretized observations ; .
The probability distribution
, being the unique solution to the Kolmogorov equation
, has strictly positive components
for any
due to condition 3 above (see [
40]). Hence, we can define the backward-time TIM for the initial MJP
as follows
The assertion below declares a backward-time representation of the MJP .
Proposition 2. The MJP (1) satisfies the backward time SDS The processrepresents -adapted square integrable martingale: The proof of Proposition 2 can be found in [
41].
To define the backward-time filtering estimate, we introduce the positive random numbers and matrices:
One can obtain the backward-time filtering estimates
, which are calculated by an algorithm similar to the original forward-time one
with the initial condition
Third, in the assertion below, we present a numerically efficient two-filtering algorithm for the calculation of the fixed-interval smoothing MJP state estimate.
Proposition 3. The fixed-interval smoothing estimate , , has the formwhere the unnormalized estimate is calculated via the filtering estimates , and prior MJP distribution as The proof of Proposition 3 is similar to the one of Theorem 4 in [
39].
4. EM-Like Algorithm of HMM Parameter Identification
Before the description of the parameter identification algorithm, we suggest some hints for it.
Let us consider the continuous-time HMM (
1), (
3), for which both process
and
are observable. Due to the strong law of large numbers for MJP [
42] and regenerative processes [
43] the following strong limits hold as
:
The identity
is also true
-a.s. for any
on the set
. In the above formulae,
The processes
and
are determined by formulae (
5) and (
6).
The “level sum”
is a particular case of (
7):
One can treat identification formulae (
18) and (
19) as some application of the method of moments [
44]. Furthermore, [
33], if
is the unobservable state and identifiability condition 5 above holds, then it is possible to restore
precisely given the diffusion observations
:
Thus, in “the ideal situation” the CE
allows identifying the HMM parameters
. However, two facts block the direct implementation of this idea. The estimate
represents a solution to the SDS with both the diffusion and counting processes on the right-hand side. The SDS uses some transformation of the original observations (
3), which is a result of some double limit passage. There is no effective numerical procedure for its realization. Further, the CE
depends on the exact values of
,
f and
, which are unknown.
We perform several steps to overcome this obstacle and develop a suboptimal identification algorithm. First, under rather mild conditions [
45], there exists a continuous dependency of the solution to the SDS on the parameters, i.e., if the sequence of parameters
converges to
as
, then the sequence of the corresponding SDS solutions
as
. This means the performance of the estimate
is close to the CE
when the used parameters
are close enough to the exact values
.
Second, in the case of the known parameters
and the sequence of the nested partitions
, due to Levy theorem [
40],
a.s. as
. This means that, under the identifiability condition 5 above, the optimal filtering estimate
calculated by the discretized observations
is close to the exact MJP state
, and the optimal smoothed estimate
is even more so.
Third, both the trajectories of the state
and CE
given the diffusion observations
(
3) are piecewise constant ones with values in
. By contrast, the CE
calculated by the discretized observations
(
2) belongs to the “probabilistic simplex”
. One can neutralize this issue rather easily, replacing the estimate
with the maximum a posteriori probability (MAP) estimate
:
The CE is the -optimal estimate, whereas is the -optimal one.
Fourth, Formulae (
18)–(
20) with the processes
,
and
substituted by their CEs, can be the base for the iterative identification procedure. Indeed, on the first step of the loop, the smoothed estimate
is calculated given the previous values of the HMM parameters
. Based on
, one can compute the required auxiliary estimates
,
and
:
Using them, in the second step, one can recalculate the HMM parameters
by the modifications of (
18), (
19) and (
20):
The authors of [
2,
29,
30] presented the versions of EM algorithm for the continuous-time HMM (
1), (
3) parameters in the case of additive observation noise (
). It is shown that the formulae (
18) and (
19) establish the maximization step. Moreover, in [
29,
30], it is shown that, under the condition
for all admissible parameter values
and
, the recursive identification procedure converges to a local maximum of the conditional likelihood function.
Below (see Algorithm 1) is a sketch of the EM like algorithm of the discrete-continuous HMM (
1), (
2) parameter identification. The input parameters include the initial values of parameters
, maximal number of iterations
and minimal relative parameter variation per iteration
.
Algorithm 1: The algorithm for HMM parameter identification. |
- 1:
▹ Required accuracy index: initialization - 2:
▹ Maximal number of iterations: initialization - 3:
for to N do ▹ Parameter initialization: begin - 4:
for to M do - 5:
- 6:
for to M do - 7:
- 8:
end for - 9:
end for - 10:
for to N do If - 11:
- 12:
end for - 13:
- 14:
end for ▹ Parameter initialization: end - 15:
- 16:
repeat ▹ Identification loop: begin - 17:
- 18:
- 19:
- 20:
- 21:
for to R do ▹ Forward and backward time filtering: begin - 22:
formulae ( 8), ( 9) - 23:
formulae ( 13), ( 14) - 24:
end for ▹ Forward and backward time filtering: end - 25:
for to R do ▹ Smoothing and CE to MAP conversion: begin - 26:
formulae ( 16), ( 17) - 27:
formula ( 23) - 28:
end for ▹ Smoothing and CE to MAP conversion: end - 29:
for to N do ▹ HMM Parameter recalculation: begin - 30:
for to M do - 31:
formulae ( 24)–( 26), ( 28) - 32:
for to M do - 33:
formulae ( 24)–( 26), ( 29) - 34:
end for - 35:
end for - 36:
for to N do If() - 37:
formulae ( 24)–( 27) - 38:
end for - 39:
- 40:
end for ▹ HMM Parameter recalculation: end - 41:
until ▹ Identification loop: end return
|
5. Numerical Schemes of Filtering Estimate Calculation
Neither the forward-time conditional distribution
nor the backward-time one
are absolutely continuous with respect to the Lebesgue measure; hence, the practical calculation of the integrals (
8) and (
13) is nontrivial. Due to the law of the total probability,
The results of [
33] hint that, for small enough time increments
, the sum in the last formula can be limited by only two summands (
and 1). In this case, we replace the estimate
with its analytic approximation of order 1:
where
From the probabilistic point of view, this means that we take into account at most one state transition occurring at the discretization interval
. If
and
is so small that
, then (see [
33])
and the filtering approximation error is characterized by the inequality
The explicit form of
is
where
is the Kronecker symbol, and
The integral in (
33) cannot be calculated analytically; thus, as the analytic approximation
, we have to calculate it numerically, appending some error. We suggest using the composite method of the midpoint rectangles with the substep
for some
:
Denoting
, we present the numerical approximation of order
based on the composite midpoint rectangle scheme:
The total approximation error of
is characterized by the inequality [
34]
with some positive constants
and
. Thus, the order of the numerical approximation is at most 1.
Above in this section, we discuss the numerical approximation of the forward-time filtering estimate. A similar framework is applicable for the numerical realization of backward-time filtering. As in the forward-time, we assume at most one MJP state transition occurred on the time increment. We also have to consider the nonhomogeneous nature of the MJP
viewed in the reversed time. The analogue of formula (
33) has the form
where
Both the first summand in (
37) and the function
(
38) contain integrals; therefore, we replace them with the following midpoint approximations:
Finally, the backward-time analogue of
, which is valid for the numerical realization, takes the form
Denoting
, we present the numerical approximation of order
based on the composite midpoint rectangle scheme for backward-time filter:
6. Numerical Experiments
To illustrate the properties of the proposed estimation algorithms, we performed a complex of numerical experiments. We fixed the values for some basic parameters of HMM (
1), (
2). Varying some of them, we fulfilled the calculation, and the obtained results help us to highlight various performance characteristics of the suggested estimates.
We chose an HMM with the following parameters:
,
,
6.1. Filtering vs. Smoothing: Estimation Performance Analysis
The HMM state filtering numerical algorithm is a backbone of subsequent HMM parameter identification; thus, the high precision of the proposed filtering estimates is a vital requirement.
The optimal filtering algorithm of the discrete-time HMM states given the observations with state-dependent noises is presented in ([
2], chap. 3). Its adaptation to the time-discretized observations is a specific case of (
30), with the matrix
calculated other than by (
35).
We compare the proposed filtering and smoothing estimates, calculated by the composite midpoint rectangle scheme and the ones from [
2]. Both numerical schemes are stable. The difference can be exposed clearly, when the time-discretization step is sufficiently large. The numerical experiment was conducted for the HMM with the basic parameter values (
42) and the following auxiliary parameter values: the time interval
, the time-discretization step
, the MJP state simulation step
and the step of the numerical integration in (
35)
. The estimation error variance was calculated by the Monte-Carlo method over the sample of the size
.
, , —the sample variances of the forward-time filtering estimate, backward-time and fixed-interval smoothing one, calculated by the algorithms proposed in the paper.
,
,
—the sample variances of the forward-time filtering estimate, backward-time and fixed-interval smoothing one, calculated by the algorithms proposed in [
2].
Figure 1.
Sample error variances of the proposed estimates and the ones from [
2].
Figure 1.
Sample error variances of the proposed estimates and the ones from [
2].
Figure 2.
Available discretized observations, true MJP state and its filtering and smoothing estimates.
Figure 2.
Available discretized observations, true MJP state and its filtering and smoothing estimates.
The chosen initial distribution of the MJP
coincides with the stationary one, hence, the vector of component variances is
. The combination of the high transition intensity and the large discretization step (
), along with the absence of the observation drift (
f is a zero vector) and the slightly different noise intensities
establish poor conditions for the state estimation; hence, one can expect for the variances of estimate errors to be close to the ones of the MJP itself. In this numerical experiment, both the filtering and smoothing algorithms are better than the adaptation of the discrete-time algorithms [
2]. Moreover, looking at the sample variances of the second MJP component, one can see that the computational errors become so high that the smoothing demonstrates worse performance than the filtering does.
To compare the performance of the filtering estimates and the smoothing one, the next numerical experiment is fulfilled with the following time steps: , and .
Figure 2 presents specific trajectories of the available observations, components of the accurate MJP state, their forward- and backward-time filtering estimates and the fixed-interval smoothing ones. One can see the superiority of the smoothing estimate performance.
Condition 5 is valid for the HMM with parameters (
42); hence, we can expect convergence of both the filtering and smoothing estimates to the accurate state value as the time discretization step vanishes.
Figure 3 presents sample error variances for all types of estimates (forward-time filtering
, backward-time-filtering
and fixed-interval smoothing ones
) calculated with various time discretization steps
. One can see the expected convergence.
The results of the first stage of the numerical experiment allow us to make the following conclusions.
Both the proposed numerical approximation based on the composite midpoint rectangle scheme (
35) and the adaptation of the algorithm from [
2] demonstrate the stable character: both the estimates have non-negative components and satisfy the normalization property. Their performances look similar; however, (
35) demonstrates more stability in the case of the large discretization step (
).
The drift f is absent in the observations. However, all conditional noise intensities are different; therefore, identifiability condition 5 is valid for this HMM. The calculation results confirm the convergence of the proposed estimates to the real MJP state as .
The quality of the fixed-interval smoothing estimates is significantly better than the one of the filtering estimates: the error variances are distinguished about twice.
6.2. Parameter Identification
We performed the complex of the identification numerical experiments for the HMM with parameters (
42), where the observation drift vector
f is replaced by
. This change allows us to illustrate the identification performance of all HMM parameters, including the drift
f. The observations are available on the time interval
. We simulate the “continuous-time” state with the small step
, and use the step
in the numerical integration scheme (
40).
To analyze how the time discretization of observations affects the parameter identification performance, we choose the following time step values: and . The number of iteration is bounded: with the minimal relative parameter variation per iteration .
Table 1 contains the identification results of the TIM
,
Table 2 stands for the results of the drift
f identification, and
Table 3 stands for the ones of the diffusion
g. All the tables include
The exact parameter values (, f and g).
The starting points for the iterative identification procedure (, and ).
The obtained estimates (, and ).
Table 1.
Identification results of the TIM .
Table 1.
Identification results of the TIM .
| | | | | Relative Errors (%) |
---|
| −18.0 | 12.0 | 6.0 | −18.6 | 11.6 | 7.0 | −1.0 | 0.5 | 0.5 | −14.6 | 7.4 | 7.2 | 18.9 | 38.3 | 20.0 |
10−2 | 9.0 | −18.0 | 9.0 | 8.7 | −17.2 | 8.5 | 0.5 | −1.0 | 0.5 | 10.0 | −16.0 | 6.0 | 11.1 | 11.1 | 33.3 |
| 6.0 | 12.0 | −18.0 | 6.0 | 11.2 | −17.2 | 0.5 | 0.5 | −1.0 | 9.7 | 5.3 | −15.0 | 61.7 | 55.8 | 16.7 |
| −18.0 | 12.0 | 6.0 | −18.9 | 11.8 | 7.1 | −1.0 | 0.5 | 0.5 | −16.6 | 10.1 | 6.5 | 7.8 | 15.8 | 8.3 |
2 × 10−4 | 9.0 | −18.0 | 9.0 | 8.9 | −17.5 | 8.6 | 0.5 | −1.0 | 0.5 | 7.6 | −14.6 | 7.0 | 15.6 | 18.9 | 22.2 |
| 6.0 | 12.0 | −18.0 | 6.0 | 11.5 | −17.5 | 0.5 | 0.5 | −1.0 | 5.5 | 9.8 | −15.3 | 8.3 | 18.3 | 15.0 |
| −18.0 | 12.0 | 6.0 | −18.9 | 11.8 | 7.1 | −1.0 | 0.5 | 0.5 | −17.4 | 11.1 | 6.3 | 3.3 | 7.5 | 5.0 |
10−4 | 9.0 | −18.0 | 9.0 | 8.9 | −17.5 | 8.6 | 0.5 | −1.0 | 0.5 | 8.2 | −15.6 | 7.4 | 8.9 | 13.3 | 17.8 |
| 6.0 | 12.0 | −18.0 | 6.0 | 11.5 | −17.5 | 0.5 | 0.5 | −1.0 | 5.6 | 9.8 | −15.4 | 6.7 | 18.3 | 14.4 |
| −18.0 | 12.0 | 6.0 | −18.9 | 11.8 | 7.1 | −1.0 | 0.5 | 0.5 | −18.4 | 11.4 | 7.0 | 2.2 | 5.0 | 16.7 |
10−5 | 9.0 | −18.0 | 9.0 | 8.9 | −17.5 | 8.6 | 0.5 | −1.0 | 0.5 | 8.5 | −16.9 | 8.4 | 5.6 | 6.1 | 6.7 |
| 6.0 | 12.0 | −18.0 | 6.0 | 11.5 | −17.5 | 0.5 | 0.5 | −1.0 | 6.1 | 11.0 | −17.1 | 1.7 | 8.3 | 5.0 |
Table 2.
Identification results of the observation drift elements f.
Table 2.
Identification results of the observation drift elements f.
| f | | | Relative Errors (%) |
---|
| −10.0 | 5.0 | 20.0 | −1.0 | 0.0 | 1.0 | −9.9 | 5.1 | 20.1 | 1.0 | 2.0 | 0.5 |
| −10.0 | 5.0 | 20.0 | −1.0 | 0.0 | 1.0 | −10.1 | 5.0 | 20.3 | 1.0 | 0.0 | 1.5 |
| −10.0 | 5.0 | 20.0 | −1.0 | 0.0 | 1.0 | −10.1 | 5.0 | 20.2 | 1.0 | 0.0 | 1.0 |
| −10.0 | 5.0 | 20.0 | −1.0 | 0.0 | 1.0 | −10.0 | 4.9 | 20.0 | 0.0 | 2.0 | 0.0 |
Table 3.
Identification results of the observation diffusion elements g.
Table 3.
Identification results of the observation diffusion elements g.
| g | | | Relative Errors (%) |
---|
| 0.1 | 0.2 | 0.3 | 0.05 | 0.15 | 0.4 | 0.1 | 0.2 | 0.3 | 0.0 | 0.0 | 0.0 |
2 × 10−4 | 0.1 | 0.2 | 0.3 | 0.05 | 0.15 | 0.4 | 0.1 | 0.2 | 0.3 | 0.0 | 0.0 | 0.0 |
| 0.1 | 0.2 | 0.3 | 0.05 | 0.15 | 0.4 | 0.1 | 0.2 | 0.3 | 0.0 | 0.0 | 0.0 |
| 0.1 | 0.2 | 0.3 | 0.05 | 0.15 | 0.4 | 0.1 | 0.2 | 0.3 | 0.0 | 0.0 | 0.0 |
Additionally, we compare the TIM estimate
with an estimate
, calculated by (
18) given the MJP state
. Clearly,
is unobservable directly; hence,
can be treated as some “ideal estimate”.
Below, we present the evolution of the HMM parameter estimates depending on the iteration number. The results are obtained for the discretization step
.
Figure 4 contains the estimates of the off-diagonal TIM
coefficients in comparison with their exact values,
Figure 5 contains the analogous information concerning the drift coefficients
f, and
Figure 6 presents the identification results of the diffusion coefficients
g.
The identification quality can be illustrated in another way. One can build the histogram using the time-discretized observations as a sample. If the time step is small enough, then the theoretical pdf can be approximated by a mixture of the Gaussians , where the vector is the stationary distribution of the MJP with the TIM .
In this case, the approximation precision is
. We cannot construct the “theoretical pdf” because the true values of HMM parameters are unknown. Nevertheless, we can build its approximation
, where
and
are identified values of the drift and diffusion and
is the stationary distribution of a MJP with the TIM
.
Figure 7 contains all the pdf estimates mentioned above, and the calculation is performed for the time increment
.
The results of the second stage of the numerical experiment allow us to make the following conclusions.
All estimates demonstrate high precision. Even for the relatively large time increment and the corresponding small observation amount , the obtained estimates are precise.
The iterative identification procedure converges quickly. In all numerical experiments, the iteration number does not exceed 15, and the identification procedure is terminated by the decline of the minimal relative parameter variation per iteration less than the threshold .
The precision of the parameter estimates depends on the time increment . The most impact of the declining to the identification performance can be seen in the estimates of the drift f and diffusion g parameters.
The estimation of the transition intensity matrix was found to be a challenging problem. Its identification precision demonstrates low dependency on the time increment value. Under a fixed observation interval , the estimate cannot be precise more than the “ideal estimate” calculated by the precise knowledge of the state trajectory.
7. Conclusions
The paper presented a complete technological circuit to identify continuous-time HMM parameters. We included the identification problem formulation, which is proper from the mathematical point of view. The article contains a theoretical foundation of the solution and a numerical algorithm of the HMM parameter identification. We tested the proposed identification procedure over the extended complex of the numerical experiments. The numerical study confirmed the applicability of the current version of the identification algorithm and the high precision of the calculated estimates.
Nevertheless, we consider some directions for the further refinement of the algorithm and its theoretical basis. First, the authors of [
29,
30] have proven the convergence of the identification procedure to a local maximum of the conditional likelihood function. Let us remember that they considered the stochastic differential HMM with a scalar observable process and additive observation noises (the case
). The keystone of the proof is the absolute continuity of the probability measures connected using a Girsanov transform. The continuity does not hold in the case of the continuous-time HMM with the observations corrupted by the multiplicative noise. Nevertheless, it is highly likely that this property will be provable for the time-discretized ones. Potentially, this would prove the convergence of the identification procedure and control the value of the conditional likelihood function during the iterations.
Second, the proposed algorithm has a significant drawback: it does not admit direct parallelization. This is an inherent feature of all identification algorithms based on fixed-interval smoothing, including the classic Baum–Welch algorithm [
21,
22]. Nevertheless, the proposed two-filter version of the fixed-interval smoothing admits double parallelization of the forward- and backward-time filtering procedures in each iteration step. Furthermore, the identification process could converge to a local maximum of the conditional likelihood function, depending on the starting point. The identification procedure admits the parallelization in the following sense: several exemplars of the identification procedure are performed on the independent processors given the same observation bulk and different starting points. Then, we can choose one of the calculated results, using either the likelihood function or some additional information concerning the estimated parameters.
Third, the current version of the algorithm can be modified to incorporate various constraints and describing the admissible set of the HMM parameters. We can illustrate this idea with the above-stated problem. It contains inequality (
4), which postulates strong positivity for all off-diagonal elements of the TIM
. This constraint defines an open admissible set of
values. Theoretically, this leads to the absence of the maximum likelihood TIM estimate for certain samples. Practically, this appears as the convergence of the estimates to a matrix with some zero off-diagonal elements. To avoid this situation, one should replace the initial open set of admissible parameter values with some closed subsets, using additional barriers. They can be either stationary or variable during the iterative process to preserve the calculated estimates from leaving the admissible set.