2.1. Tensor Theory Introduction
The aging state of the thyristor performance is now considered as a material point, and the thyristor aging and failure process satisfies the continuity requirement of the continuous medium concept. In addition, the aging and failure process of the thyristor shows a strong directionality according to the difference of environmental parameters and the length of service time. Therefore, it is proposed that all the state points corresponding to the whole failure process of the thyristor can be considered as a continuous medium for the following research. For a power electronic or mechanical component, the space formed by all the monitored parameters corresponding to the period from the initial perfect state to the complete failure state is called the tensor domain.
The tensor domain state-evaluation theory is based on multi-source heterogeneous monitoring information and spatial classification. The operational aging state of a component or a system can be characterized by inputting real-time information. Furthermore, the aging process and velocity change of any aging state of a component or a system can also be characterized by the velocity of the material point (state point).
In the continuous medium, different material points occupy different spatial positions at the same moment, which describes the spatial positions of the material points. At different moments, the same material point can also occupy different spatial positions, which describes the law of the motion of the material points. Additionally, the law of the motion of all the material points constitutes the law of the motion of the continuous medium, which is usually described by Eulerian coordinates as xi. It can describe the station background of the motion of the continuous medium. The spatial position of an arbitrary material point can be represented by the vector path coordinate r(x1, x2, x3). gi is the covariant basis vector, which is the position function of the material point in space and can be expressed as gi = ∂(x1, x2, x3)/∂xi.
The state aging of a component or a system is generally described by a continuous state surface in the three-dimensional space, as shown in
Figure 1. According to the continuous medium theory, the state aging process of a component or a system is continuous. The aging of a component or a system gradually transforms from the blue area to the yellow area. The blue area corresponds to the initial perfect state (i.e., new, unused ideal state), while the yellow area corresponds to the complete failure state (i.e., unserviceable state). For a certain aging state point, the aging speed will have different possible directions due to the difference of environmental parameters and service time, which is reflected in the tensor domain as the possible change direction of a material point not being unique. For the aging state points under different service times, it is reflected in the tensor domain as the state material points in different spatial locations.
For the practical engineering application, the aging state function in the tensor domain needs to be discretized to describe the operation state of a component or a system. The aging state function is divided by the classification surfaces, and the tensor domain is divided into several sub-regions, each corresponding to an aging state. An example of the aging state classified into four cases is shown in
Figure 2.
The coordinates of the material point of the aging state of a component or a system are assumed to be expressed in terms of the vector path as . The coordinate values here are the parameters related to the thyristor operation or the characteristic quantities extracted from the parameters.
Applying the tensor theory to evaluate the aging state of a component or a system means to treat the aging state as a material point in the tensor domain. First, the correspondence between the aging state and the tensor needs to be determined, and then the classification surface to classify the state mass point tensor domain into several tensor subdomains is determined. The aging state material point in each tensor subdomain corresponds to a class of the aging state. The aging state identification and evaluation of a component or a system can be realized by such classification.
Here, we define the aging state classification surface decision function as:
This decision function classifies the different vector paths into different tensor state subdomains by the classification surface function .
In order to accurately identify the aging state of the thyristor, it is necessary to clarify the classification decision function, through which each aging state prime is divided into the state tensor subdomain to which it belongs. The aging state identification results are expressed as:
where
—aging state judgment function,
—multi-valued symbolic function,
—the classification of the state tensor domain into
m state subdomains requires
k classification surfaces. The value domain of
can be any integer between 1 and
m. Each integer represents a class of aging states, and the maximum value
m means that the tensor domain state space is classified into a total of
m state subdomains.
For different types of power electronic components, due to the difference in the production process, there are differences in their aging process laws, i.e., their corresponding state space tensor domains are different. Furthermore, the aging law of a component or a system is closely related to environmental conditions. The time-varying characteristics of the environmental conditions lead to the time-varying characteristics of the aging state variations indirectly. In other words, the tensor domain corresponding to the aging state material point has a deformation to time, and this real-time deformation can be expressed by the partial derivative of the material point vector path to time.
According to the previous discussion, the vector path
of a state material point in a continuous medium is related to both the spatial coordinates of the state material point and time. The velocity of the state material point can be expressed in terms of the time partial derivative:
As is can be seen from Equation (3), is related to both time and the material point .
Assuming that the time
remains constant, the partial derivative of the state material point vector path with respect to its coordinate position
represents the covariant basis vector
at the moment. The physical meaning of
is the deformation of a state material point in the tensor domain at time
. This covariant basis vector
is then derived with respect to time to obtain the deformation rate of the continuous medium in the tensor domain corresponding to the aging state as:
2.2. Classification of Tensor Subdomain
In this section, the thyristor degradation model based on the tensor domain estimation theory will be applied to automatically locate the classification surfaces of each tensor subdomain using a temporal fuzzy clustering algorithm [
18].
Here, we assume the sample series is as , where N is the data length of the time series X. If is a multivariate sample, and M is the number of dimensions of the sample.
A segment of
X can represent a time-continuous series of sample points
, denoted as
. We assume that the sample series
X can be divided into
c non-overlapping segments, denoted as:
where
The sample series segmentation objective function is defined as
, which is generally defined as the distance between the true data points of the sample series and the data points of the sample series fitting function (generally using a linear or a polynomial function).
The region boundary
is calculated by Equation (7) to obtain the optimal segment location. Currently, dynamic programming and various heuristic algorithms shown as Equation (8) are commonly used to minimize the objective function.
where
—clustering center of the
kth tensor subdomain,
—distance from
to the cluster center, and
—membership function of
belonging to the
kth tensor subdomain.
In general,
is a (0,1) binary function, i.e., a crisp subordination degree is applied. However, in practical situations, there are fuzzy boundaries between the tensor subdomains, which are not suitable for the use of the crisp membership function. Therefore, a multivariate mixed Gaussian function shown as Equation (9) is applied as the sequence-fitting function of the clustering prototype, and the optimal region division is obtained by minimizing the sum of squared weighted distances between the data points and the center of the clustering prototype.
where
—membership of the sample data point
in the
kth tensor subdomain,
m—fuzzy clustering weighting index whose value is 2 generally,
—distance between the sample and the clustering prototype, and
—clustering prototype function for the
kth tensor subdomain, which is the multivariate mixed Gaussian function.
We assume that the sample sequence obeys a multivariate Gaussian distribution with the expectation
and the covariance matrix
.
shown as Equation (10) denotes the probability density function of the sample data points belonging to the
c tensor subdomains.
where
is the clustering prototype function
for the
kth tensor subdomain. The distance function
is inversely proportional with respect to the membership
of
in the
kth tensor subdomain, and the time variable
and the feature variable
in the sample data are independent of each other. Then, we can obtain:
where
—initial probability of clustering,
—distance between the time variable of the
ith sample data point from the center of the clustered time variable,
—distance between the feature variable of the
ith sample data point and the center of the clustering feature variable,
—clustering center of the
kth tensor subdomain in the feature space, and
r is the rank of the characteristic variable distance
.
can be estimated by the fuzzy covariance matrix of multivariate Gaussian distribution
, which can be calculated as:
To facilitate the inversion of the covariance matrix
, the strong correlation between variables must be eliminated. Principal component analysis (PCA) can perform a series of operations and transformations on high-dimensional data to eliminate correlations between high-dimensional data so as to achieve dimensionality reduction while retaining the information of the original variables as much as possible [
19]. We assume the covariance matrix
has
q non-zero eigenvalues (in descending order)
and their corresponding eigenvectors
, where
. Then, we can obtain:
where
For the feature variable
of the sample data, the PCA algorithm is applied to reduce to
q dimensions. Then, we can obtain
, where
. The value of
can also be calculated by the probabilistic principal component analysis (PPCA) algorithm through
[
20], where
is a
q ×
q orthogonal transformation matrix and
can be calculated as:
So far, the automatic fuzzy tensor subdomain classification algorithm has been converted into an optimization problem. The objective function is:
The constraint condition includes:
The optimization problem can be solved by the alternating optimization (AO) algorithm [
21], and the basic steps are as follows.
Step 1: Initialization
The number of segments of the sample sequence X and the dimensionality of the feature vector space retained by the PCA algorithm are given. The suitable termination condition is chosen and are initialized.
Step 2: Loop Computation
First, the parameters of the clustering prototype are calculated as Equations (20)–(27).
The initial probability of clustering can be calculated as:
The clustering center is:
where
The weights can be updated as:
The variance can be updated as:
The characteristic variable distance can be calculated as:
The clustering prototype center for the sample series time is:
The variance for the sample series time is:
Second, the clusters are merged as Equations (28)–(31). For the two adjacent tensor subdomains
and
, the similarity of the two subdomains is compared to determine whether they need to be merged. Since the PCA algorithm is used as mentioned before, a PCA-based similarity factor calculated as Equation (28) is applied to perform as one of the merging criteria.
where
and
are the first
qth principal components of the feature vectors of the tensor subdomain
and
, respectively.
Another merging criterion is the distance between the clustering centers of the feature vectors of the tensor subdomains
and
, which is calculated as:
Since the clustering process is performed within the sample sequence, the fuzzy decision algorithm is applied to measure the clustering similarity of each tensor subdomain in the whole. The overall similarity matrix of the decision process is:
where
When is greater than the set threshold, the tensor subdomains and are merged. The loop computation ends when the algorithm termination condition is reached.
2.3. Data-Processing Algorithms
2.3.1. Improved Adaptive Synthetic Sampling Algorithm
The main principle of the adaptive synthetic sampling algorithm is to increase the size of the few data classes adaptively according to the data density distribution so as to enhance the sensitivity of the classifier to the few data classes, especially the sensitivity of those samples that are difficult to achieve the learning function. The main shortcoming of the adaptive synthetic sampling algorithm is that it can only be applied to binary systems generally, which can be undesirable when there are abnormal data samples. In this section, an improved adaptive synthetic-sampling algorithm that is capable of handling multiple classification problems is proposed. The specific algorithm steps are introduced below [
22].
Step 1: Sample Sorting
The majority of the category sample is found by the label of the data sample, which is shown in Equation (32).
where
.
Step 2: Expanding of the First Category
The imbalance degree between the first and the second type of data is defined as:
To minimize the generation of samples on subsequent decision boundaries, the following individual categories are arranged according to the size of the data samples. An acceptable inter-category imbalance threshold is set. If , the data volume of category is considered to be satisfied and no new samples need to be synthesized for it. On the other hand, if , it is necessary to synthesize new samples for the category to expand its data volume. There are three steps to realize the synthesis.
First, calculate the amount of data to be synthesized for the few category expansions as:
where
indicates the level of data balance that will be achieved after data expansion for the few categories. When
is 1, a fully balanced amount of category data can be achieved. When
is 0, no new samples will be synthesized. We assume the sample satisfies
, and the
k nearest neighbor samples of
can be obtained based on Euclidean distance. The probability that these nearest neighbor samples belong to the category
can be calculated as:
where
is the number of these nearest neighbor samples in the category
, and
.
Second,
is normalized to obtain the ratio distribution as:
Third, the required number of samples to be synthesized can be calculated as:
For every
,
data samples are required to be added, and the expanded sample number of
can be calculated as:
Step 3: Expanding of the Second Category
Similar to Step 2 mentioned above, the imbalance degrees
and
of the data volume of the third category with respect to the first category and the second category, respectively, after the expansion of the sample are calculated as Equation (39).
The imbalance degrees are compared with the set threshold of the inter-category imbalance to determine the number of new samples needed,
and
.
The ratio of the nearest neighbor samples to all samples
and
are calculated as Equation (41).
The ratio distribution is then calculated as Equation (42).
The amount of data needed to expand the new sample can be calculated as:
Then, the second category sample can be expanded, and the expanded sample number of
can be calculated as:
Step 4: Expanding of Other Categories
The process of Step 3 is repeated to expand the sample data of other categories including
. To eliminate the influence of abnormal data points, if
appear during the calculation, which means that the sample data point is completely surrounded by the other categories of data, it can be considered as an abnormal sample point for elimination. The flow chart of the improved adaptive synthetic sampling algorithm is shown in
Figure 3.
2.3.2. Gradient-Boosting Decision Tree Algorithm
In this section, the gradient-boosting decision tree algorithm is applied as a classifier for the state thyristor state evaluation to obtain the tensor subdomain classification sur-faces. The algorithm is derived from the decision tree. The errors can be corrected by a differentiable loss function during the training process, which can realize further convergence. The training process of the gradient-boosting decision tree algorithm is shown in
Figure 4. The algorithm obtains one weak classifier in each iteration, and then all the resulting classifiers are trained based on the residuals of the previous classifier. During the training process, the bias-reduction approach is applied to increase the classifier accuracy so as to obtain the final classifier with the best accuracy. Generally, the classification decision trees are chosen as the base classifier of the algorithm, and the depth of these classification decision trees is not deep due to the requirement of the base classifier. Finally, the total classification model is obtained by weighting and summing the weak classifiers obtained in each training round.
The total classification model can be expressed as:
where
x—input sample,
t—classification tree,
—parameter of the weak classifier, and
—weight of each tree. The model is trained for a total of M rounds, and weak classifiers are generated in each training round. The corresponding weak classifier loss function is:
is the current classifier model. Generally, the best parameters for the next weak classifier are obtained by the empirical risk-minimization method when applying the gradient-boosting decision tree algorithm. Specifically, it is the selection of the loss function, which mainly includes the square loss function, logarithmic loss function, and 0–1 loss function. The difference in the squared loss function is consistent with the concept of residuals. The fastest direction of descent can be obtained by making the loss function fall along the gradient direction, which is the reason for the application of the gradient in the algorithm. The algorithm applies the negative gradient of the loss function corresponding to each round of the weak classifier to fit the decision tree. The negative gradient direction of the loss function of the current weak classifier is fitted at each iteration so as to reduce the loss function as soon as possible during the training process and converge to the optimal local solution or the optimal global solution.
The steps of the gradient-boosting decision tree algorithm are as follows.
Step 1: Extraction of Relevant Features
The dataset to characterize the thyristor state is established based on the environmental conditions related to the operation of the converter valve in the UHVDC transmission project (including the temperature and humidity of the valve hall and the electromagnetic interference situation), the thyristor operating voltage and current, and the field test results.
Step 2: Comparison of the Imbalance Degree between Categories
The inter-category imbalance threshold is set to 0.5 in the evaluation. The inter-category imbalance degree of the dataset is compared with the threshold. If the imbalance degree of the dataset is higher than the threshold, the improved adaptive synthetic sampling algorithm introduced above is applied to balance the data in the dataset. On the contrary, if the imbalance degree of the dataset is lower than the threshold, no data balancing is performed.
Step 3: Training
The gradient-boosting decision tree algorithm is applied to train the data in the dataset, and the tensor domain state classification surface is obtained by the training.
Step 4: Verification of the Algorithm and Identification of the Test Samples
The reliability of the algorithm is verified by the existing data. The algorithm can also be further applied to identify the state of the test samples.
In summary, the flowchart of the gradient-boosting decision tree algorithm is shown in
Figure 5.