Next Article in Journal
EasyLB: Adaptive Load Balancing Based on Flowlet Switching for Wireless Sensor Networks
Next Article in Special Issue
Semi-Supervised Segmentation Framework Based on Spot-Divergence Supervoxelization of Multi-Sensor Fusion Data for Autonomous Forest Machine Applications
Previous Article in Journal
Bayesian Finite Element Model Updating and Assessment of Cable-Stayed Bridges Using Wireless Sensor Data
Previous Article in Special Issue
Globally Optimal Distributed Kalman Filtering for Multisensor Systems with Unknown Inputs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A KPI-Based Probabilistic Soft Sensor Development Approach that Maximizes the Coefficient of Determination

1
Key Laboratory of Knowledge Automation for Industrial Processes of Ministry of Education, School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
2
Department of Automation Engineering, Technical University of Ilmenau, 98684 Ilmenau, Thuringia, Germany
*
Author to whom correspondence should be addressed.
Sensors 2018, 18(9), 3058; https://doi.org/10.3390/s18093058
Submission received: 28 July 2018 / Revised: 27 August 2018 / Accepted: 7 September 2018 / Published: 12 September 2018
(This article belongs to the Collection Multi-Sensor Information Fusion)

Abstract

:
Advanced technology for process monitoring and fault diagnosis is widely used in complex industrial processes. An important issue that needs to be considered is the ability to monitor key performance indicators (KPIs), which often cannot be measured sufficiently quickly or accurately. This paper proposes a data-driven approach based on maximizing the coefficient of determination for probabilistic soft sensor development when data are missing. Firstly, the problem of missing data in the training sample set is solved using the expectation maximization (EM) algorithm. Then, by maximizing the coefficient of determination, a probability model between secondary variables and the KPIs is developed. Finally, a Gaussian mixture model (GMM) is used to estimate the joint probability distribution in the probabilistic soft sensor model, whose parameters are estimated using the EM algorithm. An experimental case study on the alumina concentration in the aluminum electrolysis industry is investigated to demonstrate the advantages and the performance of the proposed approach.

1. Introduction

With the increasing demands placed on industry, requiring a decrease in the defective rate of products, better economic efficiency, and improved safety, there has been a growing demand to develop and implement approaches that can improve the overall control strategy [1]. The first issue that needs to be solved is achieving accurate and real-time estimation of key performance indicators (KPIs) [2]. The difficulty is that these KPIs are usually not easy to measure, or the measurement has significant time delay. Even if some KPIs are measurable, due to the complexity and nonlinearity of modern industrial systems and their complex working conditions, the KPIs may be extremely unreliable [3]. One way to solve the above problems is to develop a soft sensor, which seeks to select a group of easier-to-measure secondary variables that are correlated with the required primary variables (i.e., KPIs in this paper), so that the system is capable of providing process information as often as necessary for control [4,5]. In the development of a successful soft sensor, a good process model is required. The process models can be divided into two major categories: first principles models and data-driven models [6,7]. Although it is desirable to apply mass and energy balances to build a complete first principles model, lack of process knowledge, plant–model mismatch, and nonlinear characteristics limit the applicability of such an approach to the simplest processes. As an alternative, data-driven soft sensors are developed from historical data without necessarily considering any outside process knowledge. Data-driven soft sensors, which solely use available process data to develop a model of the process, have recently attracted considerable attention and have been successfully applied in many fields [8], such as fault detection (FD) and process monitoring, that are important for many industrial processes. Serdio [9] introduced an improved fault detection approach based on residual signals extracted online from system models identified by high-dimensional measurements provided by the multisensor network. The data-driven system identification model can also be combined using multivariate orthogonal space transformations and vectorized time-series models to achieve enhanced residual-based fault detection in condition monitoring systems equipped with a multisensor network [10]. Shardt [11] proposed a data-driven design of a diagnostic-observer-based process monitoring method, which was extended to include the ability to detect changes given infrequent KPI measurements. Yan [12] and Gabrys [13] introduced the most popular data-driven soft sensor modelling techniques, as well as discussing some issues in soft sensor development and maintenance and their possible solutions. Data-driven methods can be divided into three categories: models based on statistical analysis, models based on statistical learning theory [14], and models based on artificial intelligence [15].
Of interest for this paper are models developed using statistical methods to extract the relevant information from the large amounts of industrial data that are produced by the complex processes. Statistical methods have been developed that can handle such large datasets and develop useful models. Common methods include principal component analysis (PCA) [16] and partial least squares (PLS) [17]. PCA is a powerful tool for data compression and information extraction that can simplify the model structure and improve the speed of operations. However, PCA can only deal with the correlations between vectors in the same matrix. To overcome this limitation, PLS was developed as an approach that models the correlation between independent variables and dependent variables. Since PLS only applies to linear systems or weakly nonlinear systems, many nonlinear PLS algorithms have been developed to handle nonlinear systems. The neural-network-based PLS algorithm [18] uses the nonlinear processing capability of a neural network to describe the relationship between variables. However, the determination of the network structure and the selection of network training algorithms are difficult problems. In addition, if there are too many datapoints, the model structure will be very complex and the accuracy will be difficult to guarantee.
On the other hand, considering that data-driven modeling methods use historical data for training, this raises the question of how to handle missing data. Along with issues such as the reliability of sensors and multirate sampling, missing data is common in practical industry process [19,20]. For example, in the aluminum electrolysis process, the alumina concentration is usually obtained manually by laboratory staff. Considering human factors and chemical examination equipment reliability, data loss occurs from time to time. In this case, this type of measurement has different effects on the soft sensor modeling process and state estimation performance. Therefore, in order to make the soft sensor more suitable for practical, complex industrial processes, the missing data problem needs to be taken seriously. Compared with the direct deletion of missing data, the data interpolation method [21] is better able to restore the real situation. Currently, data interpolation methods include the mean substitution method, the regression interpolation method, and the expectation maximization (EM) algorithm. Of these, the mean substitution method can cause biased estimates, and the regression interpolation method is built based on a complete data set, where the linear relationship between the variables with missing values and other variables is necessary, which, in many cases, cannot be satisfied. In fact, the EM algorithm has good practical value as an iterative algorithm for simplifying the maximum likelihood estimation when dealing with missing data in sample sets [22].
Recently, in order to evaluate the accuracy of the model output, the coefficient of determination approach has been considered. The coefficient of determination is the measurement of how well the regression model fits the data [23]. Feng [24] introduced the coefficient of determination as a criterion for comparing the best-wavelength partial least squares regression (PLSR) model with the full-wavelength model. Boyaci [25] used the coefficient of determination to evaluate the adulteration rate of coffee beans, thus ensuring coffee quality. However, these applications only consider the coefficient of determination as an evaluation index without applying it for the modeling process. In general, the coefficient of determination is a criterion that can evaluate the quality of a model and has a concise structure, so it is appropriate to apply it to the soft sensor development process to establish a simpler and more accurate model for complex industrial process.
Therefore, this paper develops a KPI-based soft sensor model with simple structure and high accuracy, using the coefficient of determination method, which also solves the missing data issue using the EM algorithm.

2. Background

2.1. The Gaussian Mixture Model

As a flexible and efficient tool for probabilistic data models, a Gaussian mixture model (GMM) can be used to define any complex probability distribution function and is, therefore widely used in many statistical data modelling applications. In this paper, GMM is used to approximate the joint probability distribution in the soft sensor probability model. The reason for introducing GMM is that, theoretically, any probability distribution can be approximated using the joint weighted Gaussian distribution [26].
If x represents a multidimensional random variable, then the joint probability distribution of the GMM is expressed as
p ( x | Θ ) = l = 1 M α l p l ( x | θ l )  
where αl is the mixing coefficient, which represents the prior probability of each mixed component; M is the number of mixed components; and l = 1 M α l = 1 . Θ = ( θ 1 , θ 2 , , θ M ) is the parameter vector of each mixed component, and each Gaussian probability density function pl(x) is determined by the parameter θl = (μl, Σl), where μl is the mean and Σl is the covariance matrix. The GMM parameters αl, μl, and Σl (l = 1, 2, …, M) are estimated using the EM algorithm.

2.2. The Expectation Maximization Algorithm

The EM algorithm is a maximum likelihood estimation method for solving model distribution parameters from “incomplete data” and was first introduced in [27]. Each iteration of the algorithm involves two steps, called the expectation step (E-step) and the maximization step (M-step).

2.2.1. E-Step

Given the observation data set X and the current parameters Γ(i), the expectation of the log-likelihood function is called the Q-function which can be written as
Q ( Γ , Γ ( i ) ) = E [ log p ( X , | Γ ) | X , Γ ( i ) ]  
where γ can represent missing data due to observational conditions and other reasons, and can also refer to hidden variables. Since the direct optimization of the likelihood function is usually very difficult, the relationship between X, Γ, and γ can be established by introducing an additional variable γ to achieve the purpose of simplifying the likelihood function.

2.2.2. M-Step

A new parameter Γ(i+1) is calculated by maximizing Q(Γ, Γ(i)) which was obtained from the E-step; that is,
Γ ( i + 1 ) = arg max Γ Q ( Γ , Γ ( i ) )   .  
The iteration between the E- and M-steps continues until the elements of Γ are less than a given value.

2.3. The Coefficient of Determination

Analysis of variance is an approach for determining the significance and validity of a regression model using variances obtained from the data and model. The coefficient of determination is an analysis of variance approach that seeks to decompose the total variability in the data into various orthogonal components that can then be independently analyzed [23]. For the purposes of analyzing the regression, let the total sum of squares, denoted by TSS, be defined as
T S S = i = 1 n ( y i y ¯ ) 2  
where the real data set is represented as y = <y1, y2, …, yn> and y ¯ refers to the average of yi. Let the sum of squares due to regression, SSR, be defined as
S S R = i = 1 n ( y ^ i y ¯ ) 2  
where y ^ i denotes the predicted value of the regression model for yi. The coefficient of determination R2 represents the ratio of SSR to TSS, that is,
R 2 = S S R T S S   .  
Let the sum of squares due to the error, SSE, be defined as
S S E = i = 1 n ( y i y ^ i ) 2 .  
It can be proved that TSS = SSR + SSE [23,28], so R2 can also be expressed as
R 2 = 1 S S E T S S = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ ) 2   .  

3. Development of the Probabilistic Soft Sensor Model

In this section, in order to obtain more accurate KPI estimates, a soft sensor development approach based on maximizing the coefficient of determination is proposed. In addition, the problem of missing data in the training sample set is also considered. In order to more clearly describe the soft sensor development process, Figure 1 shows the modeling flow chart.

3.1. EM Algorithm Handing Missing Data

Let X1, X2, …, Xn be a random sample from a p-variate normal population, where Xj = (xj1, xj2, …, xjp), 1 ≤ jn, so the training sample set X can be written as
X = ( X 1 X 2   X n ) = [ x 11 , x 12 , , x 1 p x 21 , x 22 , , x 2 p x n 1 , x n 2 , , x n p ]   .  
The basic steps for processing missing data using the EM algorithm are given in [29].

3.1.1. E-Step: Prediction

For each sample Xj containing missing values, Xj = (mj, aj), where mj is the missing value and aj is the available values. Given the population mean and variance, μ ~ i and Σ ~ i , from the ith iteration and aj, we use the expectation of the conditional normal distribution of mj as the estimate of the missing value. The (i + 1)th iteration is
m ˜ j i + 1 = E ( m j | a j , μ ˜ i , ˜ i ) = μ ˜ m i + ˜ m a i ( ˜ a a i ) 1 ( a j μ ˜ a i )  
where μ ~ i is a p × 1 matrix defined as μ ˜ i = [ μ ˜ m i , μ ˜ a i ] , μ ˜ m i is the mean of the missing part, and μ ˜ a i is the mean of the available part. In addition, Σ ~ i can be written as
˜ i = [ ˜ m m i ˜ m a i ˜ a m i ˜ a a i ]   .  

3.1.2. M-Step: Estimation

We compute the maximum likelihood estimates as follows:
μ ˜ i + 1 = X ¯ i + 1  
˜ i + 1 = ( n 1 ) S i + 1 n  
where X ¯ i + 1 is the mean of the samples and Si+1 is the sample standard deviation, and they are all sufficient statistics. For a normal population, the importance of sufficient statistics is that the total information about μ and Σ in the data matrix X is contained in X ¯ and S, regardless of the sample size n. By transforming X ¯ and S, two new sufficient statistics T1 and T2 [29], given by
T 1 = n X ¯  
T 2 = ( n 1 ) S + n X ¯ X ¯  
are obtained. Combining Equations (14) and (15) with Equations (12) and (13) gives
μ ˜ i + 1 = T 1 i + 1 n  
˜ i + 1 = 1 n T 2 i + 1 μ ˜ i + 1 ( μ ˜ i + 1 )  
where
m j m j ˜ i + 1 = E ( m j m j | a j , μ ˜ i , ˜ i ) = ˜ i m m ˜ i m a ( ˜ i a a ) 1 ˜ i a m + m ˜ j i + 1 ( m ˜ j i + 1 )  
m j a j ˜ i + 1 = E ( m j a j | a j , μ ˜ i , ˜ i ) = m ˜ j i + 1 ( a j ) .  
The iteration between the E- and M-steps continues until the elements of μ ~ and Σ ~ are less than a given value. Therefore, the iteration result m ~ is the optimal substitution for the missing values, resulting in a complete training sample set X.

3.2. Soft Sensor Development Approach Based on the Coefficient of Determination Maximization Strategy

For the complete training sample set X obtained from Section 3.1, which can be written as
X = [ x 11 , x 12 , , x 1 p x 21 , x 22 , , x 2 p x n 1 , x n 2 , , x n p ]  
let ( x 1 , x 2 , x p 1 ) denote the secondary variables, and xp denote the KPI. Our objective is to estimate xp from ( x 1 , x 2 , x p 1 ) .
R2 measures the fraction of the total variance in the model explained by the regression with the given variables [23]. The range of R2 is [0,1]. Let xp be the y mentioned in Section 2.3. Then, the coefficient of determination is
R 2 = 1 i = 1 n ( x i p x ^ i p ) 2 i = 1 n ( x i p x ¯ p ) 2   .  
If the secondary variables in the soft sensor model do not account for the variance of xp, the estimate of xip, denoted x ^ i p , is exactly equal to the sample mean of xip, denoted x ¯ i p . In this case, SSR is 0 and SSE equal to TSS, so R2 = 0. On the other hand, if ( x i 1 , x i 2 , x i ( p 1 ) ) fully explains the variance of xip, for i = 1, 2,…, n, it follows that xip = x ¯ i p , i.e., each error is zero and SSR = TSS, so R2 = 1. In general, R2 does not take the extreme values 0 or 1, but instead takes a certain value between the two [28]. For the case where the number of variables, p, is much smaller than the sample number n, the closer R2 is to 1, the better the model. Therefore, when the model for the KPI maximizes R2, it becomes the best estimate of the KPI, that is,
1 i = 1 n ( x i p x ˜ i p ) 2 i = 1 n ( x i p x ¯ p ) 2 = max [ 1 i = 1 n ( x i p K i ) 2 i = 1 n ( x i p x ¯ p ) 2 ]  
where x ~ i p is the best estimate of xip, and Ki represents all possible estimates of xip. Simplifying the above equation gives
i = 1 n ( x i p x ˜ i p ) 2 i = 1 n ( x i p x ¯ p ) 2 = min [ i = 1 n ( x i p K i ) 2 i = 1 n ( x i p x ¯ p ) 2 ]  
where xip and x ¯ p are both computed values. Equation (23) can then be written as
i = 1 n ( x i p x ˜ i p ) 2 = min [ i = 1 n ( x i p K i ) 2 ]   .  
Multiplying Equation (24) on both sides by n−1 gives
1 n i = 1 n ( x i p x ˜ i p ) 2 = min [ 1 n i = 1 n ( x i p K i ) 2 ]   .  
Considering that the mathematical expectation of a discrete random variable is
E ( x ) = i x i p i  
where xi represents the ith value of the random variable x and pi represents its probability, Equation (26) can be expressed as
E { x p x ˜ p 2 } = min   E { x p K 2 }  
where K denotes all possible estimates of the KPI xp, and x ~ p represents the best estimate of the KPI when the coefficient of determination R2 is maximized. Since xp is derived from the soft sensor models and secondary variables, the above equation can be written as
x ˜ p = arg min K E [ x p K 2 | ( x 1 , x 2 , x p 1 ) ]   .  
In order to establish a more direct connection between x ~ p and (xi1, xi2, …, xi(p–1)), the left-hand side of Equation (28) will be simplified further. Firstly, it can be noted that K does not have an impact on the simplification, that is,
E [ x p K 2 | ( x 1 , x 2 , x p 1 ) ] = E [ x p E ( x p | ( x 1 , x 2 , x p 1 ) ) + E ( x p | ( x 1 , x 2 , x p 1 ) ) K 2 | ( x 1 , x 2 , x p 1 ) ]   = E [ x p E ( x p | ( x 1 , x 2 , x p 1 ) ) 2 | ( x 1 , x 2 , x p 1 ) ] + E [ E ( x p | ( x 1 , x 2 , x p 1 ) ) K 2 | ( x 1 , x 2 , x p 1 ) ]   + E [ [ x p E ( x p | ( x 1 , x 2 , x p 1 ) ) ] T [ E ( x p | ( x 1 , x 2 , x p 1 ) ) K ] | ( x 1 , x 2 , x p 1 ) ]   + E [ [ E ( x p | ( x 1 , x 2 , x p 1 ) ) K ] T [ x p E ( x p | ( x 1 , x 2 , x p 1 ) ) ] | ( x 1 , x 2 , x p 1 ) ]  
In order to minimize the above equation, the following should hold:
K = E [ x p | ( x 1 , x 2 , x p 1 ) ]  
which can be rewritten as
x ˜ p = E [ x p | ( x 1 , x 2 , x p 1 ) ]   .  
Furthermore, E [ x p | ( x 1 , x 2 , x p 1 ) ] can be expanded according to the definition of expectation, giving
x ˜ p = E [ x p | ( x 1 , x 2 , x p 1 ) ]   = x p p [ x p | ( x 1 , x 2 , x p 1 ) ] d x p   . = x p p ( x 1 , x 2 , x p 1 , x p ) p ( x 1 , x 2 , x p 1 ) d x p
Thus, this establishes the basic framework of the probabilistic soft sensor model with KPI optimal estimation.
The next part is to solve the joint probability distribution in the model.
In this paper, GMM is used to approximate the joint probability distribution. Let p ( x e ) = p ( x 1 , x 2 , x p 1 ) ; that is,
p ( x e ) = j = 1 M α j p ( x j e | θ j )  
p ( x e , x p ) = l = 1 M α l p ( x l e , x l p | θ l )   .  
In order to deduce the specific representation of the KPI optimal estimation x ~ p under the proposed probabilistic soft sensor model, we first introduce Lemma 1.
Lemma 1.
[30] Let G(x; μ, Σ) be a multidimensional normal density function with mean μ and covariance matrix Σ. Let x T = ( x 1 T , x 2 T ) , μ T = ( μ 1 T , μ 2 T ) , and Σ = [ Σ 11 Σ 12 Σ 21 Σ 22 ] ; then, the joint probability density is
p ( x ) = G ( x 1 ; μ 1 , Σ 11 ) G ( x 2 ; μ x 2 | x 1 , Σ x 2 | x 1 )  
where
μ x 2 | x 1 = μ 2 Σ 21 Σ 11 1 ( μ 1 x 1 )  
Σ x 2 | x 1 = Σ 22 Σ 21 Σ 11 1 Σ 12   .  
Proof. 
The details of the proof can be found in [30].
Using Lemma 1, it follows that
p ( x l e , x l p ) = G ( x l ; μ l , Σ l ) = G ( x l e ; μ l e , Σ l e e ) G ( x l p ; μ l p | e , Σ l p | e )
where μ l = ( μ l e T , μ l p T ) and Σ l = [ Σ l e e Σ l e p Σ l p e Σ l p p ] . Therefore, Equations (33) and (34) can be written as
p ( x e ) = j = 1 M α j G ( x j e ; μ j e , Σ j e e )  
p ( x e , x p ) = l = 1 M α l G ( x l e ; μ l e , Σ l e e ) G ( x l p ; μ l p | e , Σ l p | e )   .  
Substituting Equations (39) and (40) into Equation (32) gives
x ˜ p = x p p ( x e , x p ) p ( x e ) d x p = x p l = 1 M α l G ( x l e ; μ l e , Σ l e e ) G ( x l p ; μ l p | e , Σ l p | e ) j = 1 M α j G ( x j e ; μ j e , Σ j e e ) d x p   .
Extracting the sum in the numerator to outside the integral gives
x ˜ p = l = 1 M x p α l G ( x l e ; μ l e , Σ l e e ) G ( x l p ; μ l p | e , Σ l p | e ) j = 1 M α j G ( x j e ; μ j e , Σ j e e ) d x p   .  
In order to make the derivation more concise, the positions of some factors in the integral are changed as follows:
x ˜ p = l = 1 M α l G ( x l e ; μ l e , Σ l e e ) j = 1 M α j G ( x j e ; μ j e , Σ j e e ) x p G ( x l p ; μ l p | e , Σ l p | e ) d x p = l = 1 M α l G ( x l e ; μ l e , Σ l e e ) j = 1 M α j G ( x j e ; μ j e , Σ j e e ) x p G ( x l p ; μ l p | e , Σ l p | e ) d x p   .
When the integral part is the conditional expectation, the above equation can be simplified to
  x ˜ p   = l = 1 M α l G ( x l e ; μ l e , Σ l e e ) j = 1 M α j G ( x j e ; μ j e , Σ j e e ) μ l p | e   .  
Therefore, the detailed soft sensor model expression of the KPI optimal estimation is obtained.
In this paper, unknown parameters in the model are estimated using the EM algorithm. The iterative equations of the EM algorithm for estimating the GMM parameters are [31]
μ l ( i + 1 ) = j = 1 n γ j l ( i + 1 ) X j j = 1 n γ j l ( i + 1 ) ,   Σ l ( i + 1 ) = j = 1 n γ j l ( i + 1 ) ( X j μ ( i ) ) 2 j = 1 n γ j l ( i + 1 ) ,   α l ( i + 1 ) = j = 1 n γ j l ( i + 1 ) n  
where γjl represents the responsivity of the mixed component l on the training sample data Xj. It can be written as
γ j l ( i + 1 ) = α l p ( X j | θ l ) l = 1 M α l p ( X j | θ l )   .  
Consequently, the above steps give the GMM parameters, and the KPI optimal estimate x ~ p follows.

4. Case Study

In this section, the effectiveness and feasibility of the proposed soft sensor model approach based on maximizing the coefficient of determination are evaluated through an industrial aluminum electrolytic production process. To show the advantages of the probabilistic soft sensor framework, the estimations are compared with the real values. For performance evaluation, the root-mean-squared error (RMSE) index is used.

4.1. Soft Sensor Development for Industrial Aluminum Electrolytic Process

Aluminum is widely used in construction and electrical industries [32]. The main method currently chosen for smelting aluminum plants is the cryolite–alumina molten salt electrolysis process, in which the electrochemical reaction process takes place in an electrolytic cell. Figure 2 shows the internal structure of the electrolytic cell.
Molten cryolite is a solvent in which aluminum oxide is dissolved as a solute, forming a melt with good electrical conductivity. Carbon materials are used as cathodes and anodes, and a direct current is passed through them. The thermal energy of the direct current is used to melt the cryolite and maintain a constant electrolysis temperature. Furthermore, the electrochemical reaction occurs between the two electrodes, where the product at the cathode is aluminum liquid, and carbon dioxide and other gases are generated at the anode. The chemical reaction of the electrolytic process is
2Al2O3 + 3C → 4Al + 3CO2.
The chemical reaction can produce gases other than carbon dioxide and carbon monoxide, as well as fluorocarbon gases. The gas purifying device uses alumina and fluorine generated in the mixed gas to produce fluorinated alumina, and the fluorinated alumina is then recycled to the electrolytic cell for chemical reaction. Figure 3 shows the process flow diagram of the aluminum electrolysis process.
The main control goal of the aluminum electrolysis process is to keep the alumina concentration in the electrolysis cell stable within a certain range, preferably between 1.5% and 3.5% [33]. The control of alumina concentration relates to energy consumption and economic benefits of the aluminum electrolytic production process. On one hand, when the alumina concentration is too low, an additional chemical reaction occurs at the anode, which can easily cause a sudden rise in the cell voltage and the energy balance of the cell is destroyed. On the other hand, when the concentration reaches saturation, if the feeder continues to add alumina at the time, the raw material will be deposited at the bottom of the cell, so that the resistance increases and the current efficiency becomes low. Therefore, it is necessary to keep the alumina concentration in the proper range.
In soft sensor development for the aluminum electrolytic process, the measurable variables, the voltage x1 between the two electrodes obtained by the first voltage measuring instrument; the anode conductor current x2; the voltage x3 between the two electrodes obtained by the second voltage measuring instrument; and the alumina concentration x4 provided by an electrochemical analyzer, were selected as the secondary variables. The interelectrode voltage refers to the voltage between the anode guide and the corresponding cathode steel bar. The alumina concentration y provided by the laboratory is the primary variable for the model. Figure 4 shows a diagram of the process measurement system.
The variables x1(k), x2(k), x3(k), x4(k), and y(k) form the joint probability distribution
p ( x ( k ) ) = p ( x 1 ( k ) , x 2 ( k ) , x 3 ( k ) , x 4 ( k ) , y ( k ) )   .  
The soft sensor was then developed according to the process described in Section 3 of this paper. It is assumed that M = 2.

4.2. Experimental Results

4.2.1. EM Algorithm and Missing Values

We took 600 complete data groups from the training sample set, and deleted 10%, 20%, or 30% of the alumina concentration variable data. Then, the mean substitution method, the regression interpolation method, and the EM algorithm were used to process the sample set with missing values. Table 1, Table 2 and Table 3 show the mean and RMSE of the alumina concentration sample set for the three method simulations for missing ratios of 10%, 20%, and 30%.
First, comparing the mean value, we can see from the above tables that the means of the regression interpolation method and the EM data interpolation method are closer to the mean of the real value set, and the mean substitution method is less effective. Obviously, the RMSE of the EM data interpolation method is much smaller than that of the regression interpolation method. Therefore, the accuracy and effectiveness of the EM data interpolation method in processing missing values is verified. Further, if there is a problem with missing values in the practical industrial process, the EM algorithm can be selected for data interpolation.

4.2.2. Experimental Results of the Soft Sensor Model Based on Maximizing the Coefficient of Determination

In order to verify the feasibility of the proposed approach, a test sample set was used to validate the designed soft sensor model. The test sample set was divided into four subsets of 100 samples. The actual alumina concentration measurement obtained from the laboratory was compared with the output of the soft sensor model to acquire an estimated performance evaluation of the model. The results are shown in Figure 5. Figure 5a–d show the estimated alumina concentrations based on the first, second, third, and fourth test subsets, respectively. Table 4 shows the root-mean-square errors (RMSE) of the four test subsets. It can be seen that, overall, the soft sensor model based on maximizing the coefficient of determination can accurately track the overall trends in the process. The alumina concentration output by the model is approximately the same as the actual laboratory measurement.

4.2.3. Comparison with BP and LSSVM

The backpropagation (BP) neural network and the least-squares, support vector machine (LSSVM) model were applied to the test sample set, and the first test subset was used for performance comparison. The parameters of the comparison algorithms were determined as follows: The number of hidden layer nodes in the BP neural network model was 100 and the activation function of the hidden layer was a sigmoid [34]. The kernel function of the LSSVM model was the radial basis function (RBF), and the kernel parameter and regular parameter were 1 and 20, respectively [34]. For each model, the number of secondary variables was 4, and the number of primary variables was 1. It could be seen that the two comparison models need different parameters in order to achieve an accurate estimation performance, while this is not necessary for the soft sensor model based on maximizing the coefficient of determination. The estimated results are shown in Figure 6 and Figure 7. Figure 6 shows the estimated values of the soft sensor based on the BP neural network for the first test subset, and Figure 7 shows the estimated values of the soft sensor based on the LSSVM for the first test subset. It can be seen from Figure 6 that the soft sensor based on a BP neural network can roughly follow the trend of the laboratory measurements, but the error is still large at many points. It can be seen from Figure 7 that the overall performance of the soft sensor based on LSSVM is better than that based on a BP neural network, but compared with Figure 5a, it is obvious that the estimation of some extreme points is not as accurate as that given by the soft sensor based on maximizing the coefficient of determination.
Figure 8, Figure 9 and Figure 10 show the soft sensor estimates based on different modelling methods as a function of the laboratory measurements. The green circles show the BP neural network model; the purple circles the LSSVM model; and the red circles the proposed coefficient of determination maximization model. In the ideal case, the circles should lie on the blue y = x line. In practice, deviations from this behavior can provide information about the accuracy of the models. The BP neural network soft sensor produces a soft sensor system that has a consistent bias, since the values are consistently located above the y = x line. Furthermore, the bias in the LSSVM soft sensor model is smaller, but there also seems to be a calibration issue, since the data does not lie parallel to the y = x line. Finally, the proposed model has the smallest deviations and the most ideal performance.
To better illustrate the performance of the proposed soft sensor model, Table 5 shows the RMSE values for the different methods. As can be seen from Table 5, the RMSE of the proposed method is smallest, which means that the estimation effect of the proposed model is better than those of the BP neural network model and the LSSVM model.

5. Conclusions

In this paper, a new KPI estimation method for probabilistic soft sensor development is proposed based on maximizing the coefficient of determination. The joint probability distribution in the probability model is approximated using GMM, while the EM algorithm is used to estimate the GMM parameters. In addition to providing accurate, real-time estimates of the KPIs, this paper also considers the missing values that training sample sets often face and uses the EM algorithm for processing. The resulting soft sensor design method was tested on a case study of the alumina extraction process, which shows that the proposed method can provide alumina concentration estimations that are consistent with the actual measurements obtained from laboratory tests. Future work will focus on applying the proposed soft sensor development approach to solving various problems such as dealing with dynamic, non-Gaussian, or batch processes.

Author Contributions

Y.Z. and X.Y. conceived the idea, while Y.A.W.S. and X.Y. provided assistance with the development and implementation of the methods. J.C. and C.T. provided the industrial data and the experimental set-up for case study, respectively. Y.Z. performed the simulations and analysed the data, with assistance from X.Y. and Y.A.W.S. Y.Z. wrote the paper with editorial assistance from Y.A.W.S. and X.Y.

Funding

This research was funded by the National Natural Science Foundation of China, grant number [#61673053 and #61603034]; the Beijing Natural Science Foundation, grant number [#4162041 and #3182027]; and the National Key R&D Program of China, grant number [2017YFB0306403].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shardt, Y.A.W.; Mehrkanoon, S.; Zhang, K.; Yang, X.; Suykens, J.; Ding, X.S.; Peng, K.X. Modelling the Strip Thickness in Hot Steel Rolling Mills Using Least-squares Support Vector Machines. Can. J. Chem. Eng. 2018, 96, 171–178. [Google Scholar] [CrossRef]
  2. Zhang, K.; Shardt, Y.A.W.; Chen, Z.W.; Yang, X.; Ding, S.X.; Peng, K.X. A KPI-Based Process Monitoring and Fault Detection Framework for Large-Scale Processes. ISA Trans. 2017, 68, 276–286. [Google Scholar] [CrossRef] [PubMed]
  3. Stanojevic, P.; Orlic, B.; Misita, M.; Tatalovic, N.; Lenkey, G.B. Online Monitoring and Assessment of Emerging Risk in Conventional Industrial Plants: Possible Way to Implement Integrated Risk Management Approach and KPI’s. J. Risk Res. 2013, 16, 501–512. [Google Scholar] [CrossRef]
  4. Paulsson, D.; Gustavsson, R.; Mandenius, C.F. A Soft Sensor for Bioprocess Control Based on Sequential Filtering of Metabolic Heat Signals. Sensors 2014, 14, 17864–17882. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Abeykoon, C. A novel soft sensor for real-time monitoring of the die melt temperature profile in polymer extrusion. IEEE Trans. Ind. Electron. 2014, 61, 7113–7123. [Google Scholar] [CrossRef]
  6. Yuan, X.F.; Ge, Z.Q.; Huang, B.; Song, Z.H. A Probabilistic Just-in-Time Learning Framework for Soft Sensor Development with Missing Data. IEEE Trans. Control Syst. Technol. 2017, 25, 1124–1132. [Google Scholar] [CrossRef]
  7. Chen, K.; Liang, Y.; Gao, Z.L. Just-in-Time Correntropy Soft Sensor with Noisy Data for Industrial Silicon Content Prediction. Sensors 2017, 17, 1830. [Google Scholar] [CrossRef] [PubMed]
  8. Khatibisepehr, S.; Huang, B.; Khare, S. Design of inferential sensors in the process industry: A review of Bayesian methods. J. Process Control 2013, 23, 1575–1596. [Google Scholar] [CrossRef]
  9. Serdio, F.; Lughofer, E.; Zavoianu, A.C.; Pichler, K.; Pichler, M.; Buchegger, T.; Efendic, H. Improved fault detection employing hybrid memetic fuzzy modeling and adaptive filters. Appl. Soft. Comput. 2017, 51, 60–82. [Google Scholar] [CrossRef]
  10. Serdio, F.; Lughofer, E.; Pichler, K.; Buchegger, T.; Pichler, M.; Efendic, H. Fault detection in multi-sensor networks based on multivariate time-series models and orthogonal transformations. Inf. Fusion 2014, 20, 272–291. [Google Scholar] [CrossRef]
  11. Shardt, Y.A.W.; Hao, H.Y.; Ding, S.X. A New Soft-Sensor-Based Process Monitoring Scheme Incorporating Infrequent KPI Measurements. IEEE Trans. Ind. Electron. 2015, 62, 3843–3851. [Google Scholar] [CrossRef]
  12. Yan, W.W.; Shao, H.H.; Wang, X.F. Soft sensing modeling based on support vector machine and Bayesian model selection. Comput. Chem. Eng. 2004, 28, 1489–1498. [Google Scholar] [CrossRef]
  13. Kadlec, P.; Gabrys, B.; Strandt, S. Data-driven Soft Sensors in the process industry. Comput. Chem. Eng. 2009, 33, 795–814. [Google Scholar] [CrossRef] [Green Version]
  14. Shang, C.; Gao, X.Q.; Yang, F. Novel Bayesian Framework for Dynamic Soft Sensor Based on Support Vector Machine with Finite Impulse Response. IEEE Trans. Control Syst. Technol. 2014, 22, 1550–1557. [Google Scholar]
  15. Fujiwara, K.; Kano, M.; Hasebe, S. Development of correlation-based pattern recognition algorithm and adaptive soft-sensor design. In Proceedings of the IFAC Symposium on Advanced Control of Chemical Processes (ADCHEM), Istanbul, Turkey, 12–15 July 2009. [Google Scholar]
  16. Yuan, X.; Ye, L.; Bao, L.; Ge, Z.; Song, Z. Nonlinear feature extraction for soft sensor modeling based on weighted probabilistic PCA. Chemom. Intell. Lab. Syst. 2015, 147, 167–175. [Google Scholar] [CrossRef]
  17. Geladi, P. Notes on the history and nature of partial least squares (PLS) modelling. J. Chemom. 1988, 2, 231–246. [Google Scholar] [CrossRef]
  18. Qin, S.J.; McAvoy, T.J. Nonlinear PLS modeling using neural networks. Comput. Chem. Eng. 1992, 16, 379–391. [Google Scholar] [CrossRef]
  19. Khatisbisepehr, S.; Huang, B. Dealing with Irregular Data in Soft Sensors: Bayesian Method and Comparative Study. Ind. Eng. Chem. Res. 2008, 47, 8713–8723. [Google Scholar] [CrossRef]
  20. Qi, F.; Huang, B.; Tamayo, E.C. A Bayesian Approach for Control Loop Diagnosis with Missing Data. AICHE J. 2010, 56, 179–195. [Google Scholar] [CrossRef]
  21. Newman, D.A. Missing Data: Five Practical Guidelines. Organ. Res. Methods 2014, 17, 372–411. [Google Scholar] [CrossRef]
  22. Zhang, K.K.; Gonzalez, R.; Huang, B.; Ji, G.L. Expectation-Maximization Approach to Fault Diagnosis with Missing Data. IEEE Trans. Ind. Electron. 2015, 62, 1231–1240. [Google Scholar] [CrossRef]
  23. Shardt, Y.A.W. Statistics for Chemical and Process Engineers: A Modern Approach; Springer International Publishing: Cham, Switzerland, 2015; ISBN 978-3-319-21508-2. [Google Scholar]
  24. Feng, C.H.; Makino, Y.; Yoshimura, M.; Rodriguez, F.J. Estimation of adenosine triphosphate content in ready-to-eat sausages with different storage days, using hyperspectral imaging coupled with R statistics. Food Chem. 2018, 264, 419–426. [Google Scholar] [CrossRef] [PubMed]
  25. Sezer, B.; Apaydin, H.; Bilge, G.; Boyaci, I.H. Coffee arabica adulteration: Detection of wheat, corn and chickpea. Food Chem. 2018, 264, 142–148. [Google Scholar] [CrossRef] [PubMed]
  26. Sun, S.L.; Zhang, C.S.; Yu, G.Q. A Bayesian network approach to traffic flow forecasting. IEEE Trans. Intell. Transp. Syst. 2006, 7, 124–132. [Google Scholar] [CrossRef]
  27. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 1977, 39, 1–38. [Google Scholar]
  28. Stock, J.H.; Watson, M.W. Introduction to Econometrics, 3rd ed.; Addison-Wesley: Bosten, MA, USA, 2010; ISBN 978-0-13-800900-7. [Google Scholar]
  29. Johnson, R.A.; Wichern, D.W. Applied Multivariate Statistical Analysis, 6th ed.; Pearson: London, UK, 2007; ISBN 978-0-13-187715-3. [Google Scholar]
  30. Rao, C.R. Linear Statistical Inference and Its Applications; Wiley: New York, NY, USA, 1973; ISBN 978-0-47-031643-6. [Google Scholar]
  31. Bilmes, J.A. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov model. Int. Comput. Sci. Inst. 1998, 4, 126. [Google Scholar]
  32. Mouedhen, G.; Feki, M.; Wery, M.D.P.; Ayedi, H.F. Behavior of aluminum electrodes in electrocoagulation process. J. Hazard. Mater. 2008, 150, 124–135. [Google Scholar] [CrossRef] [PubMed]
  33. Yao, Y.C.; Cheung, C.Y.; Bao, J.; Skyllas-Kazacos, M.; Welch, B.; Akhmetov, S. Estimation of spatial alumina concentration in an aluminium reduction cell using a multilevel state observer. AICHE J. 2017, 63, 2806–2818. [Google Scholar] [CrossRef]
  34. Zhang, S.; Zhang, T.; Yin, Y.X.; Xiao, W.D. Alumina concentration detection based of the kernel extreme learning machine. Sensors 2017, 17, 2002. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The flow chart of soft sensor development process.
Figure 1. The flow chart of soft sensor development process.
Sensors 18 03058 g001
Figure 2. The internal structure of the aluminum electrolytic cell.
Figure 2. The internal structure of the aluminum electrolytic cell.
Sensors 18 03058 g002
Figure 3. The process flow diagram of the aluminum electrolysis process.
Figure 3. The process flow diagram of the aluminum electrolysis process.
Sensors 18 03058 g003
Figure 4. Schematic diagram of the variable collection system.
Figure 4. Schematic diagram of the variable collection system.
Sensors 18 03058 g004
Figure 5. The soft-sensor-estimated alumina concentrations, based on maximizing the coefficient of determination, compared with the actual laboratory measurement using (a) the first test subset, (b) the second test subset, (c) the third test subset, and (d) the fourth test subset.
Figure 5. The soft-sensor-estimated alumina concentrations, based on maximizing the coefficient of determination, compared with the actual laboratory measurement using (a) the first test subset, (b) the second test subset, (c) the third test subset, and (d) the fourth test subset.
Sensors 18 03058 g005
Figure 6. The estimated values of the soft sensor based on a backpropagation (BP) network compared with actual laboratory measurements.
Figure 6. The estimated values of the soft sensor based on a backpropagation (BP) network compared with actual laboratory measurements.
Sensors 18 03058 g006
Figure 7. The estimated values of the soft sensor based on LSSVM compared with actual laboratory measurements.
Figure 7. The estimated values of the soft sensor based on LSSVM compared with actual laboratory measurements.
Sensors 18 03058 g007
Figure 8. Comparison between the soft sensor based on a BP neural network and laboratory measurements.
Figure 8. Comparison between the soft sensor based on a BP neural network and laboratory measurements.
Sensors 18 03058 g008
Figure 9. Comparison between the soft sensor based on LSSVM and laboratory measurements.
Figure 9. Comparison between the soft sensor based on LSSVM and laboratory measurements.
Sensors 18 03058 g009
Figure 10. Comparison between the soft sensor based on maximizing the coefficient of determination and laboratory measurements.
Figure 10. Comparison between the soft sensor based on maximizing the coefficient of determination and laboratory measurements.
Sensors 18 03058 g010
Table 1. Comparison of three data interpolation methods for a 10% missing rate.
Table 1. Comparison of three data interpolation methods for a 10% missing rate.
Mean Substitution MethodRegression Interpolation MethodEM AlgorithmReal Value
Mean2.41332.42252.42252.4259
RMSE0.08670.42090.06980
Table 2. Comparison of three data interpolation methods for a 20% missing rate.
Table 2. Comparison of three data interpolation methods for a 20% missing rate.
Mean Substitution MethodRegression Interpolation MethodEM AlgorithmReal Value
Mean2.41392.42172.42152.4259
RMSE0.14510.40750.13610
Table 3. Comparison of three data interpolation methods for a 30% missing rate.
Table 3. Comparison of three data interpolation methods for a 30% missing rate.
Mean Substitution MethodRegression Interpolation MethodEM AlgorithmReal Value
Mean2.41402.42042.411982.4259
RMSE0.17000.4068 0
Table 4. The RMSE values of the four test subsets.
Table 4. The RMSE values of the four test subsets.
Test SubsetRMSE
First0.0231
Second0.0145
Third0.0209
Fourth0.0155
Table 5. The comparison of the RMSE between the three modelling methods.
Table 5. The comparison of the RMSE between the three modelling methods.
MethodRMSE
BP neural network0.0616
LSSVM0.0431
Maximizing the Coefficient of Determination0.0231

Share and Cite

MDPI and ACS Style

Zhang, Y.; Yang, X.; Shardt, Y.A.W.; Cui, J.; Tong, C. A KPI-Based Probabilistic Soft Sensor Development Approach that Maximizes the Coefficient of Determination. Sensors 2018, 18, 3058. https://doi.org/10.3390/s18093058

AMA Style

Zhang Y, Yang X, Shardt YAW, Cui J, Tong C. A KPI-Based Probabilistic Soft Sensor Development Approach that Maximizes the Coefficient of Determination. Sensors. 2018; 18(9):3058. https://doi.org/10.3390/s18093058

Chicago/Turabian Style

Zhang, Yue, Xu Yang, Yuri A. W. Shardt, Jiarui Cui, and Chaonan Tong. 2018. "A KPI-Based Probabilistic Soft Sensor Development Approach that Maximizes the Coefficient of Determination" Sensors 18, no. 9: 3058. https://doi.org/10.3390/s18093058

APA Style

Zhang, Y., Yang, X., Shardt, Y. A. W., Cui, J., & Tong, C. (2018). A KPI-Based Probabilistic Soft Sensor Development Approach that Maximizes the Coefficient of Determination. Sensors, 18(9), 3058. https://doi.org/10.3390/s18093058

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop