3.1. Inference Process Based on the Nonlinear Membership Function
In fault diagnosis, all belief rules constitute a knowledge base. After the input information of the fault indicator is obtained, the fault mode can be diagnosed based on this knowledge base. It is worth noting that, as a fuzzy system, belief rules in BRB are expressed as mappings to linguistic values. However, the observation data of fault indicators are mostly quantitative information. Therefore, it is necessary to convert quantitative observation data into membership degree of all linguistic referential grades through a membership function, which is so-called “fuzzification” as follows:
where
is the
th referential grade of the
th fault indicator and
is the corresponding membership degree.
are the quantitative observation data.
In previous BRB models, the most commonly applied membership function is the triangular membership function, which is used in rule (or utility) based transformation methods, as shown below:
where
is a quantitative value corresponding to
, which is usually determined by experts.
The curve of the above membership function is shown in
Figure 1. It can be seen from
Figure 1 and Equation (4) that, since the derivative of this function is constant, the changing trend of membership of each referential grade is a straight line. However, when the observation data are uneven, such as when the data are concentrated in a certain area, this membership function cannot accurately reflect the corresponding membership, as shown in
Figure 2. It can be seen from
Figure 2a that the observation data are distributed evenly between the two referential grades. Therefore, it is easy to understand that the change in membership degree is linear in this case. But, in
Figure 2b, the observation data are concentrated near the referential grade
. Therefore, for the two points marked by the red dotted line, their membership degree distribution should be different. The membership of the yellow point is assigned as
. For the green point in
Figure 2b, since this point is closer to
in the whole dataset, the membership degree of
shall be greater than 0.5 and the membership degree of
shall be less than 0.5, correspondingly.
For example, the two referential grades in
Figure 2 correspond to the semantic values “low” and “high”, respectively. For the point marked in red in
Figure 2b, it should be a lower value in the entire dataset. Therefore, the membership degree of the referential grade “low” should be higher.
The inaccurate quantitative data fuzzification will reduce the modeling accuracy of the fault diagnosis model. Thus, a nonlinear membership function is proposed for the fuzzification of uneven quantitative data in this paper, which can be described as follows:
where
is the parameter of the function, which can reflect the distribution of observation data.
With the change of
, the new membership function can adaptively reflect the impact of different distributions of data. For example, when
is 0.25, 0.5, 1,2 and 5, respectively, the curves of the membership function are shown in
Figure 3. It can be seen that with the increase of
, the function changes from convex to concave. In particular, when
equals 1, the nonlinear membership function degenerates into a triangular membership function. Correspondingly, for the distribution of data in
Figure 2b, the nonlinear membership function at
0.25 or 0.5 can more accurately conduct the fuzzification of quantitative data in this case.
It is worth noting that, as another commonly used membership function in fuzzy systems, the Gaussian membership function can also realize adaptive fuzzification of input data through changes in expectation and standard deviation, as shown below:
where
is the expectation and
is the standard deviation.
However, it has the following two shortcomings: firstly, the Gaussian membership function cannot achieve accurate transformation of uniformly distributed input information, which is limited by the characteristics of its nonlinear curve. However, when
, the nonlinear membership function proposed in this paper can avoid this problem. Secondly, the adaptive ability of the Gaussian membership function is insufficient. For the data distribution within an interval, the Gaussian membership function can only self-adapt the data distribution under partial circumstances, as shown in
Figure 4. The standard variance of the Gaussian membership function is 0.25, 0.5, 1, 2 and 5, respectively, in
Figure 4. With the increase of variance, the shape of the exponential function curve cannot properly reflect the distribution characteristics of the dataset close to
. Furthermore, when the input is at
, the membership degree of
is still quite high, which is difficult for users to understand. Therefore, the Gaussian membership function is only applicable when the observation data are concentrated near the referential grade
. Based on the above analysis, it can be seen that the nonlinear membership function proposed in this paper can more accurately reflect the distributions of data.
Therefore, based on the nonlinear membership function, when observation data are obtained, the steps for fault diagnosis can be described as follows:
Step 1: Fuzzification of quantitative data. The referential grade of each fault indicator is a fuzzy partition, which is assigned to the nonlinear membership function
. For the observation data of
th fault indicator, the membership degree of each referential grade is calculated as follows:
where
are the input data.
is the parameter of the nonlinear membership function, which is usually determined by experts after observing the distribution of data or calculated based on statistical methods.
Step 2: Activation of belief rules. The activation weight of the rule is calculated as follows:
where
Step 3: Reasoning of activated rules. In this paper, the analytic ER algorithm [
30] is used to fuse the activated rules to obtain the belief degree of each failure mode as follows:
where
represents the belief degree of the
th failure mode
.
In general, the failure mode with the highest belief degree is regarded as a possible failure as the output of the model as follows:
where
indicates the diagnosed fault mode.
3.2. Model Optimization Based on the Gradient Descent Method
Due to the subjectivity and fuzziness of expert knowledge, the modeling accuracy of the initially constructed fault diagnosis model is generally difficult to meet the requirements of practical engineering. Therefore, the model parameters initially determined by experts in the BRB need to be optimized to improve the diagnostic accuracy of the model. In general, for classification problems such as fault diagnosis, the cross-entropy loss function is used as the objective function as follows:
where
indicates the category of the real fault.
is the capacity of observation data.
is a parameter vector, which is composed of rule weight, attribute weight, basic belief degree and parameters of the membership function.
Considering the constraint conditions of parameters in BRB model, the following parameter optimization model can be constructed:
In recent years, many optimization algorithms have been developed for BRB model parameter optimization, such as DE, PSO, P-CMAES and other swarm intelligence algorithms. Yang et al. [
31] pointed out that when BRB is used as an expert system, the optimization of model parameters should only be “fine-tuning”, which is also a major difference between BRB and artificial neural networks. Feng et al. [
32] pointed out that due to the operation of population initialization of swarm intelligence algorithm, the expert knowledge in BRB is likely to be destroyed and the reasoning results may conflict with intuition. This may cause the fault diagnosis results to be difficult to be convincing and weaken the interpretability of the model. Compared with the swarm intelligence algorithm, the gradient descent method directly uses derivative information and takes the parameters initially determined by experts as the initial value of optimization to search, retaining initial expert knowledge to the greatest extent. Therefore, in this paper, stemming from the derivability of the BRB reasoning process, an optimization algorithm based on gradient descent is used to train the model.
There are 4 types of parameters as optimization variables. Therefore, it is necessary to calculate the first-order partial derivative of the objective function with respect to them.
First, the first-order partial derivative of the objective function
with respect to the reasoning result
is calculated as follows:
The first-order partial derivative of the reasoning result
with respect to the basic belief degree
is:
where
So far, the first-order partial derivative of the first type of parameter has been calculated as follows:
Then, we need to calculate the first-order partial derivative of rule weight, attribute weight and parameters of the membership function. According to the chain rule, the first-order partial derivative of the reasoning result
with respect to the activation weight
needs to be obtained as follows:
where
The first derivative of the activation weight
with respect to the rule weight
is calculated as follows:
For the attribute weight
, the normalization of this parameter in Equation (9) is nondifferentiable. Therefore, only the first order partial derivative of the normalized attribute weight
can be calculated here. First, the first derivative of the activation weight
with respect to the rule matching degree
needs to be calculated:
Then, the first derivative of rule matching degree
with respect to normalized attribute weight
is calculated as follows:
Therefore, according to the chain rule, the partial derivative of the objective function with respect to the rule weight and the normalized attribute weight can be calculated as follows:
Finally, the first order partial derivative of the individual membership degree
with respect to the parameters of the membership degree function
is calculated as follows:
According to the chain rule, the partial derivative of the objective function with respect to the parameters of the membership function is calculated as follows:
Therefore, the gradient vector of the optimization variable can be obtained as follows:
Since each parameter in the optimization model has corresponding constraints, they should be approximated to meet the optimization constraints after the parameter is updated based on the gradient. Thus, the steps of parameter optimization can be summarized as follows:
Step 1: The model parameters initially given by experts: basic belief degree, rule weight, attribute weight and parameters of membership function are taken as the initial value .
Step 2: Calculate the gradient of the optimization variable .
Step 3: The optimization variables are updated as follows:
where
is the step size of iteration and is determined by the one-dimensional search method.
Step 4: Approximate projection operation. For inequality constraints of each parameter, when the value of the parameter does not meet the constraint conditions, take the adjacent bound as the approximate value. For example, if , then . Moreover, the basic belief degree of a belief rule is normalized so that the sum is 1. Therefore, is obtained.
Step 5: Calculate the gradient vector at this time. Judge whether the termination condition of the algorithm is reached. If yes, end. Otherwise, let and go to Step 3.
Finally, the fault diagnosis model proposed in this paper can be summarized as shown in
Figure 5.