Next Article in Journal
3D Buoyancy-Induced Flow and Entropy Generation of Nanofluid-Filled Open Cavities Having Adiabatic Diamond Shaped Obstacles
Next Article in Special Issue
Maximum Entropy Learning with Deep Belief Networks
Previous Article in Journal
Entropic Measure of Time, and Gas Expansion in Vacuum
Previous Article in Special Issue
Sparse Estimation Based on a New Random Regularized Matching Pursuit Generalized Approximate Message Passing Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Product Design Time Forecasting by Kernel-Based Regression with Gaussian Distribution Weights

1
MOE Key Laboratory of Measurement and Control of Complex Systems of Engineering, School of Automation, Southeast University, Nanjing 210096, China
2
Department of Automation, Yancheng Institute of Technology, Yancheng 224051, China
*
Author to whom correspondence should be addressed.
Entropy 2016, 18(6), 231; https://doi.org/10.3390/e18060231
Submission received: 20 April 2016 / Revised: 16 June 2016 / Accepted: 16 June 2016 / Published: 21 June 2016
(This article belongs to the Special Issue Information Theoretic Learning)

Abstract

:
There exist problems of small samples and heteroscedastic noise in design time forecasts. To solve them, a kernel-based regression with Gaussian distribution weights (GDW-KR) is proposed here. GDW-KR maintains a Gaussian distribution over weight vectors for the regression. It is applied to seek the least informative distribution from those that keep the target value within the confidence interval of the forecast value. GDW-KR inherits the benefits of Gaussian margin machines. By assuming a Gaussian distribution over weight vectors, it could simultaneously offer a point forecast and its confidence interval, thus providing more information about product design time. Our experiments with real examples verify the effectiveness and flexibility of GDW-KR.

Graphical Abstract

1. Introduction

Product design is a complex and dynamic process, and its duration is affected by a number of factors, most of which are of fuzzy, random and uncertain characteristics. As product design tasks occur in different companies, uncertain characteristics may vary from product to product. The heteroscedasticity thus constitutes another important feature of product design. The mapping from the factors to design time is highly nonlinear, and it is impossible to describe this mapping relationship by definite mathematical models. The degree of reasonability of the supposed distribution of product design time is a key factor in product development control and decisions [1,2,3].
The triangular probability distribution was chosen by Cho and Eppinger [1] to represent design task durations, and a process modeling and analysis technique for managing complex design projects was proposed by using advanced simulation. However, if the assumed distribution of design activity durations does not reflect the true state, the proposed algorithm may fail to obtain ideal results. Yan and Wang [2] proposed a time-computing model with its corresponding design activities in concurrent product development process. Yang and Zhang [3] presented an evolution and sensitivity design-structure matrix to reflect overlapping and their impact on the degree of activity sensitivity and evolution in the process model, and the model can be used for better project planning and control by identifying overlapping and risk for process improvements, but with the two algorithms mentioned above, normal duration of each design activity should be determined before the algorithm is executed, and if activity durations are incompatible with the actual ones, the proposed algorithm may fail to function well. Apparently, the accuracy of predetermined design time is crucial to the planning and controlling of product development processes.
Traditionally, approximate design time is analyzed by means of qualitative approaches. With the rapid development of computer and regression techniques, new forecast methods keep emerging. Bashir and Thomson [4] came up with a modified Norden model to estimate project duration in conjunction with the effort-estimation model. Griffin [5] related the length of the product development cycle to project, process and team structure factors by a statistical method, and quantified the impact of project newness and complexity on the increasing length of development cycle, but with no proposal for design time forecasts. Jacome and Lapinskii [6] developed a model to forecast electronic product design efforts based on a structure and process decomposition approach. Only a small portion of the time factors, however, are taken into account by the model. Xu and Yan [7] proposed a design-time forecast model based on a fuzzy neural network, which exhibits good performance when the sample data are sufficient. However, only a small number of design cases are available to a company, which weakens the validity of the fuzzy neural network. Therefore, a novel approach should be adopted.
Recently, kernel methods have been identified as one of the leading means for pattern classification and function approximation, and successfully applied in various fields [8,9,10,11,12,13,14]. Support vector machine (SVM), initially developed by Vapnik for pattern classification, is one of the most used models. With the introduction of the ε-insensitive loss function, SVM has been extended in use to solve nonlinear regression problems, and thus is also called support vector regression (SVR). ε-insensitive loss functions contribute to the sparseness property of SVR, but the value of ε, chosen a priori, is hard to determine. A new parameter v was then introduced and v-SVR proposed, whereby v controls the number of support vectors and training errors [11]. v-SVR has overcome the difficulty of ε determination. In recent years, much research has been done on kernel methods. Kivinen et al. considered online learning in a reproducing kernel Hilbert space in [15]. Liu et al. [16] proved that the kernel least-mean-square algorithm can be well posed in reproducing kernel Hilbert spaces without adding an extra regularization term to penalize solution norms as was suggested by [15]. Chen et al. developed a quantized kernel least mean square algorithm based on a simple online vector quantization method in [17], and proposed the quantized kernel least squares regression in [18]. Wu et al. [19] derived the kernel recursive maximum correntropy in kernel space and under the maximum correntropy. Furthermore, by combining fuzzy theory with v-SVR, Yan and Xu [20] proposed Fv-SVM to forecast the design time, which could be used to solve regression problems with uncertain input variables. However, both Fv-SVM and v-SVR assume that the noise level is uniform throughout the domain, or at least, its functional dependency is known beforehand [21]. It is thus clear that the time forecast of product design based on Fv-SVM is deficient simply due to the heteroscedasticity of product design. For better planning and controlling of product development process, any good forecast method is expected to yield not only highly precise forecast values, but also valid forecast intervals.
In terms of Gaussian margin machines [22], the weight vector of binary classifier maintains a Gaussian distribution, and what should be struck for is the least information distribution that classifieds training samples with a high probability. Gaussian margin machines provide the probability that a sample belongs to a certain class. The idea given by Gaussian margin machines is extend to the regression for the forecast of product design time. Shang and Yan [23] proposed Gaussian margin regression (GMR) on the basis of combining Gaussian margin machines and kernel-based regression. However, GMR assumes that the forecast variances are same, which is inconsistent with the heteroscedasticity that exits in design time forecast. Like Fv-SVM, GMR also fails to provide valid forecast intervals. By combining Gaussian margin machine and extreme learning machine [24,25], a confidence-weighted extreme learning machine was proposed for regression problems of large samples [26].
The present study adopts the kernel-based regression with Gaussian distribution weights (GDW-KR) by combining Gaussian margin machines with the kernel-based regression, aiming to solve problems of small samples and heteroscedastic noise in design time forecasting, providing both forecast values and intervals. Inheriting the merits of Gaussian margin machines, GDW-KR maintains a Gaussian distribution over weight vectors, seeking the least information distribution that will make each target be included in its corresponding confidence interval. The optimization problem of GDW-KR is simplified, and an approximate solution of the simplified problem is obtained by using the results of regularized kernel-based regression. On the basis of this model, a forecast method for product design time and its relevant parameter-determining algorithm are then put forward.
The rest of this paper is organized as follows: Gaussian margin machines are introduced in Section 2. GDW-KR and the method for solving the optimization problem are described in Section 3. In Section 4, the application in injection mold design is presented, and GDW-KR is then compared with other models. An extended application of GDW-KR is also given. Section 5 draws the final conclusions.

2. Gaussian Margin Machines

Suppose the samples { ( x i , y i ) } i = 1 l , where x i R m is a column vector and y i { 1 , 1 } is a scalar output. The weight vector w of a linear classifier is supposed to follow a multivariable normal distribution N m ( μ 1 , Σ 1 ) with mean μ 1 R m and covariance matrix Σ 1 R m × m . For the sample x i , we get the normal distribution:
x i T w N ( x i T μ 1 , x i T Σ 1 x i ) .
The linear classifier is designed to properly classify each sample with a high probability, that is:
Pr ( y i x i T w 0 ) ρ ,
where ρ ( 0.5 ,   1 ] is the confidence value.
By combining Equations (1) and (2), we get:
Pr ( y i x i T w y i x i T μ 1 x i T Σ 1 x i y i x i T μ 1 x i T Σ 1 x i ) 1 ρ .
GMM aims to seek the least informative distribution that classifies the training set with high probability, which is achieved by seeking a multivariable normal distribution N m ( μ 1 , Σ 1 ) with minimum Kullback-Leibler divergence with respect to an isotropic distribution N m ( 0 , a I m ) . The Kullback-Leibler divergence between N m ( μ 1 , Σ 1 ) and N m ( 0 , a I m ) is denoted by D KL ( N m ( μ 1 , Σ 1 ) N m ( 0 , a I m ) ) (the subscript KL is the abbreviation of Kullback-Leibler and D is the abbreviation of divergence), and is obtained by calculating:
1 2 ln det ( a I m Σ 1 1 ) +   1 2 tr ( ( a I m ) 1 ( μ 1 μ 1 T + Σ 1 a I m ) ) .
The optimization problem of GMM is described as:
min μ 1 , Σ 1 D KL ( N m ( μ 1 , Σ 1 ) N m ( 0 , a I m ) ) s.t. Pr ( y i x i T w y i x i T μ 1 x i T Σ 1 x i y i x i T μ 1 x i T Σ 1 x i ) 1 ρ Σ 1 > 0 , i = 1 , , l .
After omitting the constant terms in the objective function and transforming the constraints of Equation (5), we get:
min μ 1 , Σ 1 1 2 ( ln det Σ 1 + 1 a tr ( Σ 1 ) + 1 a μ 1 μ 1 T ) s.t. y i x i T μ 1 Φ 1 ( ρ ) x i T Σ 1 x i    Σ 1 > 0 , i = 1 , , l ,
where Φ 1 ( ρ ) is the inverse cumulative distribution function of a standard normal distribution. Φ 1 ( ρ ) is further equal to 2 e r f 1 ( 2 ρ 1 ) , where e r f 1 denotes the inverse Gauss error function.
Theorem 1. 
The training samples { ( x i , y i ) } i = 1 l are given, and a prior distribution over the weight vector N m ( μ 0 , Σ 0 ) is set. Then, for any δ [ 0 , 1 ] and any posterior distribution N m ( μ 1 , Σ 1 ) , the following holds with the probability of at least 1 δ :
φ ( N m ( μ 1 , Σ 1 ) , D ) C 1 1 l i = 1 l Φ ( y i x i T μ 1 x i T Σ 1 x i ) + C 2 D KL ( N m ( μ 1 , Σ 1 ) N m ( μ 0 , Σ 0 ) ) + ln 2 l δ l 1 ,
where φ ( N m ( μ 1 , Σ 1 ) , D ) = E [ φ ( w , ( x , y ) ) | ( x , y ) ~ D , w N m ( μ 1 , Σ 1 ) ] , φ ( w , ( x , y ) ) is 0 - 1 loss function, C 1 = 1 + 2 / 2 , C 2 = 2 + 2 / 2 , and D is the distribution of ( x , y ) [22,27].
Proof of Theorem 1. 
See Appendix. □

3. Kernel-Based Regression with Gaussian Distribution Weights

3.1. Optimization Problem of GDW-KR

A finite number of independent non-duplicate observations { ( x i , t i ) } i = 1 l with x i R m and t i R are considered. A kernel-based regression model approximates the unknown regression function f ( x ) as follows:
f ^ ( x ) = j = 1 l w j k ( x , x j ) ,
where k ( x , x j ) is a predefined kernel function, and w = ( w 1 , , w l ) T .
Definition 1. 
(kernel function) A kernel is a function k that for all x , z from a space χ (which needs not be a vector space) satisfies:
k ( x , z ) = < ϕ ( x ) , ϕ ( z ) > ,
where ϕ is a mapping from the space χ to a Hilbert space F that is usually called the feature space ϕ : x χ ϕ ( x ) F [28].
By assuming w N l ( μ , Σ ) with μ R l and the positive definite covariance matrix Σ R l × l , we maintain a distribution over alternative weight vectors rather than committing to a single specific vector. Let y i denote the forecasted value by the model for a given observation x i , and we obtain:
y i N l ( K i μ , K i Σ K i T ) ,
where K i is the ith row of the symmetric kernel matrix K , and K i j = k ( x i , x j ) , i = 1 , , l , j = 1 , , l . Weight vectors are required to make the target value be included in the confidence interval of the forecast value. Thus, we have the following constraint conditions:
K i μ η K i Σ K i T t i , t i K i μ + η K i Σ K i T , i = 1 , , l .
The confidence interval needs to be large enough to impose a high confidence level. To make the level higher than 95%, η should be greater than 1.96 computed by Φ 1 ( 1 ( 1 0.95 ) / 2 ) . Considering the independence of noise between samples, K i Σ K j i T is set to be 0 . Since the row vector K i cannot be a zero vector, we have K i Σ K i T > 0 , where Σ is a positive definite matrix. Hence, the covariance matrix of K Σ K T should be a positive definite diagonal matrix:
K i Σ K j i T = 0 , Σ > 0 , i = 1 , , l , j = 1 , , l ,
which indicates that kernel matrix K must be invertible because rank ( K Σ K T ) = l and rank ( K Σ K T ) rank ( K ) l .
Under the constraint conditions (11) and (12), GDW-KR aims at the least informative distribution that has the smallest Kullback-Leibler divergence with respect to an isotropic Gaussian distribution N l ( 0 , a I l ) for some constant scalar a > 0 . Thus, the optimization problem of GDW-KR is expressed as:
min μ , Σ 1 2 ln det Σ + 1 2 a tr ( Σ ) + 1 2 a μ T μ s.t. K i μ η K i Σ K i T t i ,    t i K i μ + η K i Σ K i T ,    K i Σ K j i T = 0 ,    Σ > 0 , i = 1 , , l , j = 1 , , l .

3.2. Simplification of Optimization Problem

In the problem (13), the number of unknown parameters is l + l ( l + 1 ) / 2 , which can be lowered by handling properly its constraints. First of all, let us suppose:
K Σ K T = Λ ,
where Λ = diag ( λ 1 2 , , λ l 2 ) , and λ i > 0 , i = 1 , , l . If the diagonal elements of Λ are treated as unknown parameters taking the place of Σ , the number of unknown parameters in the problem (13) is reduced to 2 l . Then, the objective function of Equation (13) is rewritten as:
min μ , Λ 1 2 ln det ( K 1 Λ K 1 ) + 1 2 a μ T μ + 1 2 a tr ( K 1 Λ K 1 ) .
As ln ( det ( K 1 Λ K 1 ) ) = ln ( det ( K 1 K 1 Λ ) ) , we have:
1 2 ln det ( K 1 Λ K 1 ) = i = 1 l ln λ i 1 2 ln det P ,
where P = K 1 K 1 . Since tr ( K 1 Λ K 1 ) = tr ( K 1 K 1 Λ ) and both K 1 and K are symmetric and invertible matrices, we obtain:
tr ( K 1 K 1 Λ ) = 1 2 a i = 1 l ( P ) i i λ i 2 ,
where ( P ) i i > 0 . Disregarding the term 1 2 ln det P in the objective function, problem (13) is rewritten as:
min μ , λ i = 1 l ln λ i + 1 2 a i = 1 l ( P ) i i λ i 2 + 1 2 a μ T μ s.t. K i μ η λ i t i ,    t i K i μ + η λ i ,    λ i > 0 , i = 1 , , l .
Assuming λ i = λ where in i = 1 , , l , the problem of GMR is obtained as:
min μ , λ l ln λ + 1 2 a λ 2 i = 1 l P i i + 1 2 a μ T μ s.t. K i μ η λ t i ,    t i K i μ + η λ ,    λ > 0 , i = 1 , , l .
Comparing the problems (18) and (19) reveals that GMR is a special case of GDW-KR.

3.3. Analysis of Optimization Problem

Proper generalization of GDW-KR can be guaranteed by Theorem 1 based on the two-sided PAC-Bayesian theorem. However, that of GDW-KR is realized here by analyzing Equation (18) based on the empirical Rademacher complexity [29].
Definition 2. 
(empirical Rademacher complexity) Let G be a family of functions mapping from X to [ a , b ] and ( x 1 , , x l ) a fixed sample of size l with elements in x . Then, the empirical Rademacher complexity of G with respect to ( x 1 , , x l ) is defined as:
S ^ ( G ) = E σ [ sup g G | 2 l i = 1 l σ i g ( x i ) | ] ,
where σ = ( σ 1 , , σ l ) T with σ i s independent uniform random variables taking values in { 1 , + 1 } [30].
Theorem 2. 
GDW-KR can be properly generalized, which is guaranteed by keeping the balance between the empirical Rademacher complexity and the fitting error.
Proof. 
The objective function of the problem (18) is rewritten as:
a i = 1 l ln λ i + 1 2 i = 1 l ( P ) i i λ i 2 + 1 2 μ T μ .
Suppose the function set is as follows:
Q c = { j = 1 l μ j k ( x , x j ) | x R m , μ R l , μ T Κ μ c 2 } ,
where c is a positive real number. Let S ^ ( Q c ) denote the empirical Rademacher complexity of Q c .
Suppose another function set is defined as:
H c = { < β , ϕ ( x ) > | β c } ,
where ϕ is the feature mapping corresponding to the kernel k .
For any h ( x ) in H c , letting β = i = 1 l μ i ϕ ( x i ) gives:
h ( x ) = < β , ϕ ( x ) > = < i = 1 l μ i ϕ ( x i ) , ϕ ( x ) > = i = 1 l μ i k ( x , x i ) ,
and:
β 2 = < i = 1 l μ i ϕ ( x i ) , j = 1 l μ j ϕ ( x j ) > = i , j = 1 l μ i μ j < ϕ ( x i ) , ϕ ( x j ) > = i , j = 1 l μ i μ j k ( x i , x j ) = μ T Κ μ .
Then, H c is a superset of Q c . Based on the derivation in [30], we obtain S ^ ( Q c ) S ^ ( H c ) and the following:
S ^ ( H c ) = E σ [ sup h H c | 2 l i = 1 l σ i h ( x i ) | ] = E σ [ sup β c | β , 2 l i = 1 l σ i ϕ ( x i ) | ] 2 c l E σ [ i = 1 l σ i ϕ ( x i ) ] = 2 c l E σ [ ( i = 1 l σ i ϕ ( x i ) , j = 1 l σ j ϕ ( x j ) ) 1 / 2 ] 2 c l ( E σ [ i , j = 1 l σ i σ j k ( x i , x j ) ] ) 1 / 2 = 2 c l ( i = 1 l k ( x i , x i ) ) 1 / 2 .
Then, we have:
S ^ ( F c ) 2 c tr ( Κ ) / l .
In view of Equation (21), c can be minimized by minimizing μ T Κ μ . Calculating by Cauchy–Schwarz inequality yields:
μ T Κ μ = μ , Κ μ μ Κ μ Κ μ 2 .
Since the kernel function is predefined, 1 2 μ T μ in Equation (20) can reduce the empirical Rademacher complexity of Q c .
Under the constraints of the problem (18), the smaller λ i , the less the fitting error. The term:
a i = 1 l ln λ i + 1 2 i = 1 l ( P ) i i λ i 2 .
prevents λ i from getting too small or too large, and thus the model j = 1 l μ j k ( x , x j ) is free from overfitting and underfitting the training data. So the term can be taken as a special loss function. Thereby, it can be concluded that proper values of a and η guarantee the balance between the empirical Rademacher complexity and the fitting error. Thus, GDW-KR promises a desirable generalization performance. Then, we have Theorem 2. □
Theorem 2 shows that balancing the empirical Rademacher complexity and the fitting loss is consistent with the two-sided PAC-Bayesian theorem for GDW-KR.

3.4. Solution of Optimization Problem

The results of regularized kernel-based regression are used to obtain the approximate solution of the problem (18). Regularized kernel-based regression is described as:
min μ , ε 1 2 ( μ T μ + C i = 1 l ε i 2 ) s.t.    K i μ t i = ε i ,    i = 1 , , l ,
where C is the regularization parameter.
Let μ ¯ be the solution to Equation (27). Using the KKT conditions, μ ¯ is analytically computed as:
μ ¯ = ( I C + Κ T Κ ) 1 Κ T t .
Then, assuming that μ is known as μ ¯ and ignoring the term 1 2 a μ T μ in the objective function, then we rewrite Equation (18) as:
min λ i ln λ i + ( P ) i i 2 a λ i 2 s.t. λ i t i * ,    λ i > 0 ,
where t i * = | K i μ ¯ t i | / η , i = 1 , , l . The second derivative of the objective function of the problem (29) is λ i 2 + ( P ) i i / a that must be larger than 0 when λ i > 0 . Let λ ¯ i be the solution to Equation (29), which is determined by:
λ ¯ i = { a / ( P ) i i , t i * a / ( P ) i i ; t i * , t i * > a / ( P ) i i .
Thus, the algorithm consists of the following steps:
  • Step 1: Make independent non-duplicated observations { ( x i , t i ) } i = 1 l .
  • Step 2: Select the kernel function, and choose the proper relevant parameter (s).
  • Step 3: Compute K 1 and P .
  • Step 4: Solve the problem (27), and let μ ¯ be its solution.
  • Step 5: Substitute K and μ ¯ into Equation (18), and obtain λ ¯ from Equation (30).
For the observation x , the forecast value is:
s T μ ¯ = j = 1 l μ ¯ j k ( x , x j ) ,
where μ ¯ = ( μ ¯ 1 , , μ ¯ l ) T , and s = ( k ( x , x 1 ) , , k ( x , x l ) ) T . And, the forecast interval is calculated as:
[ s T μ ¯ η * s T Σ ¯ s , s T μ ¯ + η * s T Σ ¯ s ] ,
where Σ ¯ = K 1 Λ ¯ K 1 and η * > 0 .

3.5. Kernel Function and Model Selection

The kernel function plays an important role in kernel function methods. There are three common types of kernel functions: linear function, polynomial function and radial basis function (RBF). Many actual applications demonstrate that RBF tends to display its desirable performance under general smoothness assumptions. With no additional knowledge of the data set available, that makes the very reason for our adoption of the kernel function [31]:
k ( x , x j ) = exp { x x j 2 / 2 σ 2 } .
Hyper-parameters also bear heavily on the generalization performance of kernel function methods. Model selection is to seek proper values of hyper-parameters commonly by means of cross-validation and grid search [32]. The k -fold cross-validation [12,13] partitions the training data into k disjoint subsets of approximately equal size. A series of k models are then trained, each using a different combination of k − 1 subsets. The model selection criterion, such as the mean squared error, is then evaluated for each model in each case, utilizing the subset of the data not used in training that model. Recently, evolutional algorithms, such as genetic algorithm and particle swarm optimization, have been adopted to guide the parameters selection process [33,34,35,36]. Regularized kernel-based regression uses genetic algorithm to seek the proper values of σ and C . An individual in genetic algorithm represents a possible parameter combination. The fitness of each individual is calculated by the k -fold cross-validation.

4. Experiments

Experiments were performed to verify the effectiveness of the proposed GDW-KR. The models were built using MATLAB 7.7. The quadratic problems involved were solved through the optimization toolbox QP in MATLAB. The experiments were made on a computer with a Win7 32 bit OS running on 3.1-GHz Intel Core i5-3450 with 4 GB RAM.

4.1. Formulation of Product-Design Time Forecast

To validate the proposed method, the design of plastic injection molds is studied. An injection mold is a kind of single-piece-designed product and the design process is usually driven by customer orders. The design process of injection mold is involved in many product development projects. The design time forecast is meaningful for the optimization of the whole product development process.
Factor values of product-design time are obtained by fuzzy measurable house of quality (FM-HOQ) [7]. Suppose that a design order for a kind of injection mold and the specification of the molding product are given to us. Then the customer demands should be analyzed and some useful mold characteristics should be extracted. The technical customer demands are taken into account. Some demands are originally described as quantitative information (e.g., the mold life is 3000 h), while others are expressed as qualitative information (e.g., the molding product precision is high). A unified fuzzy measurement scheme for all these demands is established, five linguistic levels are used [7]. The importance degrees of these demands are also represented by fuzzy weight sets.
For the specific mold design, the designer should specify the grades of membership of demand weights and demand measures, whose assignments can be made based on the customer demands given on the design order, and on the designer’s objective evaluation of the degrees of importance and scope of the demands. A survey-based methodology is applied for identifying engineering characteristics and time factors, which is performed through self-administered questionnaires from several mold companies in Nanjing. Then, nine kinds of engineering characteristics are selected: mold structure, cavity number, wainscot gauge variation, injection pressure, injection capacity, ejector type, runner shape, manufacturing precision and form feature number. Then we can construct a planning FM-HOQ to map and measure characteristics for technical demands. Among the time characteristics with large influencing weights are structure complexity (SC), model difficulty (MD), wainscot gauge variation (WGV), cavity number (CN), mold size (MS) and form feature number (FFN), the first three of which are expressed as linguistic variables and the last three as numerical ones. Here, the influencing weights that indicate the influence degree on product-design time are different from the indexes of importance in FM-HOQs. Figure 1 presents the application procedure of our model.

4.2. Product-Design Time Forecast Based on GDW-KR

In our experiments, 72 sets of molds with corresponding design time were obtained from a typical company. The detailed characteristic data and design time of these molds compose the corresponding patterns, as shown in Table 1. Numerical variables were normalized to be within [0, 1] by:
x ¯ i d = x i d min ( x i d | i = 1 l ) max ( x i d | i = 1 l ) min ( x i d | i = 1 l ) ,
where l denotes the number of samples, d the number of numerical variables, x i d the origin value of the d th number variable, and x ¯ i d the normalized value of the d th number variable. The linguistic variables, VL, L, M, H and VH, were transformed into the crisp values in terms of expertise: 0.1, 0.25, 0.5, 0.75 and 0.95.
First of all, η should be determined, mainly based on the confidence level at which the forecast interval includes the target. To make the confidence level higher than 95%, η should be greater than 1.96 computed by Φ 1 ( 1 ( 1 0.95 ) / 2 ) . The value of η is then set to 1.96, and the same is true of η * . The target outputs were normalized to be within [0, 1].
The root mean square error (RMSE), the mean absolute percentage error (MAPE) and the mean absolute error (MAE) are three criteria used to optimize model parameters:
RMSE = 1 l i = 1 l ( t i t ^ i ) 2 , MAPE = 1 l i = 1 l | t i t ^ i t i | , MAE = 1 l i = 1 l | t i t ^ i | ,
where t ^ i is the forecast value for x i . The underlying assumption for using the RMSE is that the errors are not biased and follow a normal distribution [37]. The MAPE cannot be used if there is a zero value in { t 1 , ... , t l } , and puts a heavier penalty on negative errors ( t i < t ^ i ) than on positive errors. The MAE is suitable to be used for uniformly distributed errors. Because model errors are likely to follow a normal distribution rather than a uniform distribution, the RMSE is a better criterion than the MAE [37]. Thus, we apply the RMSE as a criterion for optimizing model parameters.
The whole data set is divided into several subsets. We choose one subset as the testing set and other ones as the training set. The combination of the genetic algorithm and 5-fold cross-validation is implemented to seek its optimal parameters to minimize the RMSE for the training set. In the genetic algorithm, each individual is evaluated by performing 5-fold cross-validation on the training set. After the optimal parameters are obtained, the model is estimated by using the training set. Then, we calculate the forecast values and three criteria for the testing set. This procedure is repeated until each subset has been used once as the testing set. The testing results of the experiments are averaged over disjoint testing sets which cover the entire dataset. The selection ranges of σ and C are [0.01, 5] and [0.01, 106] respectively. The value of a was selected from [10−6, 106].
The whole data set is first divided into six disjoint subsets. When subset 6 is used as the testing set, the optimal combinational parameters of regularized kernel-based regression are selected as σ = 2.119 and C = 998746.999, and the optimal parameter of GDW-KR turns out to be a = 910.190. As illustrated by Figure 2, our GDW-KR gives the valid forecast intervals, excluding T1 and T10. In T10, the forecast interval fails to cover its corresponding target value. In T1, the interval range is too large to provide useful information.
Actual forecast values are listed in Table 2 for comparison of the models. The RMSE, the MAPE, the MAE and the average testing time are introduced to compare the forecast performance of different models. Here, the testing time means the time that is spent on solving the optimization problem and on obtaining the testing results when the hyper-parameters are given. Table 3 shows the results from four forecast models, which indicate that GDW-KR promises as high precision as other models do, and that GDW-KR can generate the forecast intervals simultaneously, thus facilitating product development to a certain extent.
The whole data set is then divided into 4 disjoint subsets. Figure 3 illustrates the results of GDW-KR from the first 54 training samples, and demonstrates that GDW-KR still performs well. Table 4 shows error statistics of four forecast models. GDW-KR does provide a satisfactory performance with small samples, and has thus been proved to be of better performance, appropriate to cases with small samples.

4.3. Extended Application of GDW-KR

Besides design time forecast, GDW-KR can also be extended to other regression problems with small samples. The Slump Test dataset, the Machine CPU dataset and the Yacht Hydrodynamics dataset, which are all from the UCI repository [38], are used to evaluate the extended application of GDW-KR. In these datasets, Fv-SVM behaves the same as v-SVR, as there is no fuzzy variable. Thus, the results of Fv-SVM are not presented here. Each dataset is divided into 6 disjoint subsets. In our experiments, both the target output and numerical attributes were normalized to be within [0, 1].
The Concrete Slump Test covers seven input and three output variables as well as 103 data points. The 28-day Compressive Strength is taken as the desired output variable. For the case of the Concrete Slump Test, the results of GDW-KR are compared with those of other two models. Concrete Slump Test results are shown in Figure 4. The three error indices of different three models are given in Table 5. On the Slump Test, GDW-KR offers forecast values with high accuracy and forecast intervals with good validity.
For the Machine CPU dataset and the Yacht Hydrodynamics, the error statistics of three forecast models are presented in Table 6 and Table 7, respectively. Figure 5 and Figure 6 indicate the forecast results when using subset 6 as the testing set.

5. Conclusions

The control and decision of product development are based on the reasonable degree of the distribution of product design time. In design time forecasting, the problems of small samples and heteroscedastic noise ought to be considered.
This paper has presented a new model of kernel-based regression with Gaussian distribution weights for product-design time forecasts, which combines Gaussian margin machines with kernel-based regression. The kernel method performs well for the problem of small samples. Unlike GMR, which assumes that the covariance matrix of the forecast values in the training set is an identity matrix multiplied by a positive scalar, GDW-KR assumes that this matrix is a positive definite diagonal matrix. GDW-KR is more suitable for addressing the problem of heteroscedastic noise than GMR, and has the advantage of providing both point forecasts and confidence intervals simultaneously.
The plastic injection mold was studied before modeling. For convincing evaluation, experiments with 72 real samples were conducted. Results from them have verified that GDW-KR promises not only as high forecast accuracy as Fv-SVM and v-SVR but forecast intervals crucial to the control and decision of product development. Undoubtedly, GDW-KR benefits from the merits of Gaussian margin machines.

Acknowledgments

This work was jointly funded by the National Natural Science Foundation of China under Grants 50875046 and 60934008 the Fundamental Research Funds for the key Universities of China under Grant 2242014K10031, and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD). We thank the three reviewers and Li Lu for their valuable comments and suggestions.

Author Contributions

Zhi-Gen Shang wrote the first draft. Hong-Sen Yan corrected and improved it. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Theorem 1. 
Suppose p , q [ 0 , 1 ] , and let D KL ( p q ) denote the Kullback-Leibler divergence between a Bernoulli variable with bias p to a Bernoulli variable with bias q . Then, we have:
D KL ( p q ) = p ln ( p / q ) + ( 1 p ) ln ( ( 1 p ) / ( 1 q ) ) .
If q > p , we have D KL ( p q ) ( q p ) 2 / ( 2 q ) , which implies that if D KL ( p q ) x , then:
q p + 2 p x + 2 x .
Using p x ( p + x ) / 2 , we have:
q ( 1 + 2 / 2 ) p + ( 2 + 2 / 2 ) x = C 1 p + C 2 x .
Let S be { ( x i , y i ) } i = 1 l . We obtain:
φ ( N m ( μ 1 , Σ 1 ) , S ) = 1 l i = 1 l φ ( N m ( μ 1 , Σ 1 ) , ( x i , y i ) ) = 1 l i = 1 l Pr ( y i x i T w 0 ) = 1 l i = 1 l Φ 1 ( y i x i T μ 1 x i T Σ 1 x i ) .
Based on the two-sided PAC-Bayesian theorem (or a Gaussian version of a theorem of McAllester) [27], we have for any δ [0,1], with probability at least 1 δ over S , for all posterior distributions N m ( μ 1 , Σ 1 ) , the following holds:
D KL ( φ ( N m ( μ 1 , Σ 1 ) , S ) φ ( N m ( μ 1 , Σ 1 ) , D ) ) D KL ( N m ( μ 1 , Σ 1 ) N m ( μ 0 , Σ 0 ) ) + ln 2 l δ l 1 .
Equation (A2) demonstrates that the average generalization error diverges from the average training error by no more than a quantity which depends on the Kullback-Leibler divergence between the posterior and prior distributions over weight vectors.
Combining Equations (A1) and (A2) yields for any δ [0,1], with probability at least 1 δ over S , for N m ( μ 1 , Σ 1 ) , the following holds:
φ ( N m ( μ 1 , Σ 1 ) , D ) C 1 1 l i = 1 l Φ ( y i x i T μ 1 x i T Σ 1 x i ) + C 2 D KL ( N m ( μ 1 , Σ 1 ) N m ( μ 0 , Σ 0 ) ) + ln 2 l δ l 1 .  

References

  1. Cho, S.H.; Eppinger, S.D. A simulation-based process model for managing complex design projects. IEEE Trans. Eng. Manag. 2005, 52, 316–328. [Google Scholar] [CrossRef]
  2. Yan, H.S.; Wang, B.; Xu, D.; Wang, Z. Computing completion time and optimal scheduling of design activities in concurrent product development process. IEEE Trans. Syst. Man Cybern. Part. A Syst. Hum. 2010, 40, 76–89. [Google Scholar] [CrossRef]
  3. Yang, Q.; Zhang, X.F.; Yao, T. An overlapping-based process model for managing schedule and cost risk in product development. Concurr. Eng. Res. Appl. 2012, 20, 3–7. [Google Scholar] [CrossRef]
  4. Basher, H.A.; Thomson, V. Models for estimating design effort and time. Des. Stud. 2001, 22, 141–155. [Google Scholar] [CrossRef]
  5. Griffin, A. Modeling and measuring product development cycle time across industries. J. Eng. Technol. 1997, 14, 1–24. [Google Scholar] [CrossRef]
  6. Jacome, M.F.; Lapinskii, V. NREC: Risk assessment and planning for complex designs. IEEE Des. Test Comput. 1997, 14, 42–49. [Google Scholar] [CrossRef]
  7. Xu, D.; Yan, H.S. An intelligent estimation method for product design time. Int. J. Adv. Manuf. Technol. 2006, 30, 601–613. [Google Scholar] [CrossRef]
  8. Burges, C.J.C. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 1998, 2, 110–115. [Google Scholar] [CrossRef]
  9. Chen, S.T. Mining informative hydrologic data by using support vector machines and elucidating mined data according to information entropy. Entropy 2015, 17, 1023–1041. [Google Scholar] [CrossRef]
  10. Vapnik, V.N. The Nature of Statistical Learning Theory, 2nd ed.; Springer-Verlag New York, Inc.: New York, NY, USA, 1999. [Google Scholar]
  11. Schölkopf, B.; Smola, A.J.; Williamson, R.; Bartlett, P.L. New support vector algorithms. Neural Comput. 2000, 12, 1207–1245. [Google Scholar] [CrossRef] [PubMed]
  12. Santiago-Paz, J.; Torres-Roman, D.; Figueroa-Ypiña, A.; Argaez-Xool, J. Using generalized entropies and OC-SVM with Mahalanobis kernel for detection and classification of anomalies in network traffic. Entropy 2015, 17, 6239–6257. [Google Scholar] [CrossRef]
  13. Ibrahim, R.W.; Moghaddasi, Z.; Jalab, H.A.; Noor, R.M. Fractional differential texture descriptors based on the Machado entropy for image splicing detection. Entropy 2015, 17, 4775–4785. [Google Scholar] [CrossRef]
  14. Benkedjouh, T.; Medjaher, K.; Zerhouni, N.; Rechak, S. Remaining useful life estimation based on nonlinear feature reduction and support vector regression. Eng. Appl. Artif. Intel. 2013, 26, 1751–1760. [Google Scholar] [CrossRef]
  15. Kivinen, J.; Smola, A.J.; Williamson, R.C. Online learning with kernels. IEEE Trans. Signal Process. 2004, 52, 2165–2176. [Google Scholar] [CrossRef]
  16. Liu, W.F.; Pokharel, P.P.; Principe, J.C. The kernel least mean square algorithm. IEEE Trans. Signal Process. 2008, 56, 543–554. [Google Scholar] [CrossRef]
  17. Chen, B.D.; Zhao, S.L.; Zhu, P.P.; Principe, J.C. Quantized kernel least mean square algorithm. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 22–32. [Google Scholar] [CrossRef] [PubMed]
  18. Chen, B.D.; Zhao, S.L.; Zhu, P.P.; Principe, J.C. Quantized kernel recursive least squares algorithm. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 1484–1491. [Google Scholar] [CrossRef] [PubMed]
  19. Wu, Z.Z.; Shi, J.H.; Zhang, X.; Ma, W.T.; Chen, B.D. Kernel recursive maximum correntropy. Signal Process. 2015, 117, 11–16. [Google Scholar] [CrossRef]
  20. Yan, H.S.; Xu, D. An approach to estimating product design time based on fuzzy v-support vector machine. IEEE Trans. Neural Netw. 2007, 18, 721–731. [Google Scholar] [PubMed]
  21. Hao, P.Y. New support vector algorithms with parametric insensitive/margin model. Neural Netw. 2010, 23, 60–73. [Google Scholar] [CrossRef] [PubMed]
  22. Crammer, K.; Mohri, M.; Pereira, F. Gaussian margin machines. In Proceedings of the 12th International Conference on Artificial Intelligence Statistics, Clearwater, FL, USA, 16–18 April 2009; pp. 105–112.
  23. Shang, Z.G.; Yan, H.S. Forecasting product design time based on Gaussian margin regression. In Proceedings of the 10th International Conference on Electronic Measurement & Instruments, Chengdu, China, 16–18 August 2011; pp. 86–89.
  24. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  25. Feng, G.; Huang, G.B.; Lin, Q.; Gay, R. Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Trans. Neura Netw. 2009, 20, 1352–1357. [Google Scholar] [CrossRef] [PubMed]
  26. Shang, Z.G.; He, J.Q. Confidence-weighted extreme learning machine for regression problems. Neurocomputing 2015, 148, 544–550. [Google Scholar] [CrossRef]
  27. McAllester, D. Simplified PAC-Bayesian margin bounds. In Proceedings of the 16th conference on Learning Theory and 7th Kernel Workshop, Washington DC, WA, USA, 24–27 August 2003; pp. 203–215.
  28. Shawe-Taylor, J.; Sun, S.L. A review of optimization methodologies in support vector machines. Neurocomputing 2011, 74, 3609–3618. [Google Scholar] [CrossRef]
  29. Robin, C.G.; Theodore, B.T. Quadratic programming formulations for classification and regression. Optim. Meth. Softw. 2009, 24, 175–185. [Google Scholar]
  30. Shawe-Taylor, J.; Cristianini, N. Kernel Methods for Pattern Analysis; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  31. Smola, A.J.; Schölkopf, B.; Müller, K. The connection between regularization operators and support vector kernels. Neural Netw. 1998, 11, 637–649. [Google Scholar] [CrossRef]
  32. Li, B.; Song, S.J.; Li, K. A fast iterative single data approach to training unconstrained least squares support vector machines. Neurocomputing 2013, 115, 31–38. [Google Scholar] [CrossRef]
  33. Hong, W.C. Chaotic particle swarm optimization algorithm in a support vector regression electric load forecasting model. Energy Convers. Manag. 2009, 50, 105–117. [Google Scholar] [CrossRef]
  34. Yuan, S.F.; Chu, F.L. Fault diagnosis based on support vector machines with parameter optimization by artificial immunization algorithm. Mech. Syst. Signal Process. 2007, 21, 1318–1330. [Google Scholar] [CrossRef]
  35. Pai, P.F.; Hong, W.C. Forecasting regional electricity load based on recurrent support vector machines with genetic algorithms. Electr. Power Syst. Res. 2005, 74, 417–425. [Google Scholar] [CrossRef]
  36. Lin, S.W.; Lee, Z.J.; Chen, S.C.; Tseng, T.Y. Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl. Soft Comput. 2008, 8, 1505–1512. [Google Scholar] [CrossRef]
  37. Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
  38. UC Irvine Machine Learning Repository. Available online: http:// archive.ics.uci.edu/ml (accessed on 17 June 2016).
Figure 1. The application procedure of the GDW-KR model.
Figure 1. The application procedure of the GDW-KR model.
Entropy 18 00231 g001
Figure 2. Testing results of GDW-KR when using subset 6 as the testing set.
Figure 2. Testing results of GDW-KR when using subset 6 as the testing set.
Entropy 18 00231 g002
Figure 3. Testing results of GDW-KR from the first 54 training samples.
Figure 3. Testing results of GDW-KR from the first 54 training samples.
Entropy 18 00231 g003
Figure 4. Concrete slump test results of GDW-KR when using subset 6 as the testing set.
Figure 4. Concrete slump test results of GDW-KR when using subset 6 as the testing set.
Entropy 18 00231 g004
Figure 5. Machine CPU results of GDW-KR when using subset 6 as the testing set.
Figure 5. Machine CPU results of GDW-KR when using subset 6 as the testing set.
Entropy 18 00231 g005
Figure 6. Yacht Hydrodynamics results of GDW-KR when using subset 6 as the testing set.
Figure 6. Yacht Hydrodynamics results of GDW-KR when using subset 6 as the testing set.
Entropy 18 00231 g006
Table 1. Training and testing data of injection model design.
Table 1. Training and testing data of injection model design.
MoldsInput DataDesired Outputs (h)
No.NameSCMDWGVCNMSFFN
1Global handleLLL43.1323
2Water bottle lidHLH40.56745.5
3Medicine lidHMVL41.5637
4Footbath basinVLVLVL10.5310
5Litter basketLMH12.11242.5
6Plastic silk flowerLMM17.1429.5
71Paper-lead pulleyLMH86.1655
72Winding trayMMVH12.18741.5
Table 2. Forecast results from four different models when using subset 6 as the testing set.
Table 2. Forecast results from four different models when using subset 6 as the testing set.
No.Designed OutputsForecast Results
Fv-SVMv-SVRGMRGDW-KR
T13135.31531.23630.13432.928
T24140.67239.16738.18639.155
T36262.02963.52164.07563.291
T434.533.31332.75430.90032.232
T51616.42416.76116.87716.156
T632.532.96532.41832.24332.801
T742.540.24338.51638.56638.811
T816.515.39415.28015.34614.708
T92221.52121.06619.96320.417
T1054.546.39147.32446.82147.789
T115554.14952.77153.50954.304
T1241.541.75239.89339.88340.894
Table 3. Error statistics of four forecast models.
Table 3. Error statistics of four forecast models.
ModelTesting ResultsAverage Testing Time (s)
RMSEMAPEMAE
Fv-SVM2.3740.0421.9050.781
v-SVR2.3870.0411.8140.764
GMR2.5490.0552.1370.572
GDW-KR2.3660.0411.8480.583
Table 4. Error statistics of four forecast models from 54 training samples.
Table 4. Error statistics of four forecast models from 54 training samples.
ModelTesting ResultsAverage Testing Time (s)
RMSEMAPEMAE
Fv-SVM2.1560.0381.6150.770
v-SVR2.1410.0371.6170.728
GMR2.2050.0381.6240.568
GDW-KR2.1330.0371.5990.579
Table 5. Error statistics of three forecast models on the Slump Test dataset.
Table 5. Error statistics of three forecast models on the Slump Test dataset.
ModelTesting ResultsAverage Testing Time (s)
RMSEMAPEMAE
v-SVR0.0210.0440.0140.795
GMR0.0230.0470.0150.583
GDW-KR0.0190.0550.0140.607
Table 6. Error statistics of three forecast models on the Machine CPU.
Table 6. Error statistics of three forecast models on the Machine CPU.
ModelTesting ResultsAverage Testing Time (s)
RMSEMAPEMAE
v-SVR0.0380.6600.0240.941
GMR0.0390.6380.0230.695
GDW-KR0.0380.5550.0230.783
Table 7. Error statistics of three forecast models on the Yacht Hydrodynamics.
Table 7. Error statistics of three forecast models on the Yacht Hydrodynamics.
ModelTesting ResultsAverage Testing Time (s)
RMSEMAPEMAE
v-SVR0.0343.5850.0251.196
GMR0.0363.7720.0270.710
GDW-KR0.0343.4710.0260.894

Share and Cite

MDPI and ACS Style

Shang, Z.-G.; Yan, H.-S. Product Design Time Forecasting by Kernel-Based Regression with Gaussian Distribution Weights. Entropy 2016, 18, 231. https://doi.org/10.3390/e18060231

AMA Style

Shang Z-G, Yan H-S. Product Design Time Forecasting by Kernel-Based Regression with Gaussian Distribution Weights. Entropy. 2016; 18(6):231. https://doi.org/10.3390/e18060231

Chicago/Turabian Style

Shang, Zhi-Gen, and Hong-Sen Yan. 2016. "Product Design Time Forecasting by Kernel-Based Regression with Gaussian Distribution Weights" Entropy 18, no. 6: 231. https://doi.org/10.3390/e18060231

APA Style

Shang, Z. -G., & Yan, H. -S. (2016). Product Design Time Forecasting by Kernel-Based Regression with Gaussian Distribution Weights. Entropy, 18(6), 231. https://doi.org/10.3390/e18060231

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop