2.2.1. Linear Support Vector Machine
Support Vector Machine (SVM) is a method proposed by Vapnik et al. based on statistical learning theory to achieve structural risk minimization. Its learning strategy is to keep the empirical risk value fixed and minimize the confidence range.
(1) Linear separable condition
Support Vector Machine (SVM) is proposed from the optimal classification hyperplane with linear separability, which is mainly aimed at the binary classification problem. The goal is to find a hyperplane so that it can correctly separate the two types of data points without errors, while keeping the separated data points farthest from the classification surface [
20], as shown in
Figure 2.
The solution is to construct a constrained quadratic programming problem, specifically, a constrained quadratic programming problem, specifically: , , to solve the problem and get the classifier.
According to statistical learning theory, the error-free separation ensures that the empirical risk is minimized (0); the distance between classifications is maximized, that is, the allowable empirical risk is realized by the simplest learning machine, and the confidence range of the generalization bound is minimized, thus minimizing the real risk. Therefore, the support vector machine has good generalization ability. The generalization ability refers to the adaptability of machine learning algorithm to fresh samples. The higher the generalization ability, the better the adaptability of the algorithm to fresh samples. The training samples of the nearest point from the classification plane and parallel to the optimal hyperplane in the two kinds of samples are called support vector.
(2) Linear inequalities.
For the linear inseparable case, the relaxation variable is introduced, , and the constrained optimization problem of the classification hyperplane is transformed into: , . Among them, C is the penalty factor, and the larger the C, the greater the penalty for erroneous classification. The Lagrange multiplier method is used to solve this quadratic programming problem with linear constraints, which can be transformed into a dual form and solved by efficient algorithms.
2.2.2. Nonlinear Support Vector Machine
For the nonlinear classification problem, the support vector machine (SVM) solution is to map the input vector x into a high-dimensional feature space by some pre-selected nonlinear mapping, and then construct the optimal classification hyperplane in this space, as shown in
Figure 3. The kernel function method avoids complex operations in high-dimensional feature space.
Figure 3 describes the mapping relationship between the input space and the high-dimensional feature space in detail. The selection of the kernel function
needs to satisfy the Mercer condition. The advantage of SVM is that it only needs to define the inner product operation
in the high-dimensional space, and it is unnecessary to know the specific form of the mapping
, so as to avoid the “dimension disaster”. At present, the main forms of kernel functions are polynomial kernel, multilayer perception (MLP) kernel, Gaussian kernel and so on.
(1) Polynomial kernel function: ;
(2) Double-level perception kernel function: ;
(3) Gauss kernel function: .
2.2.3. Multi-Class Support Vector Machine
Multi-class support vector machine (MSSVM) is proposed for two-class classification. At present, there are mainly several methods to realize multi-class classification of SVM.
(1) One-to-many method: support vector machine sub-classifiers are established for -ary classification problems, and the SVM sub-classifier is to separate class data from other data.
(2) One-to-one method: N (N-1)/2 SVMs are established for -ary classification problems, and one SVM is trained between each two classes to separate the two classes. The multivariate classifier constructed by “one-to-one” method has less training scale, balanced training data and easy to expand.
(3) To improve the objective function directly and establish classification support vector machine. Because the number of variables is excessive, this method can only be used in solving small problems.
Support Vector Machine (SVM) has been successfully applied in many fields, such as face recognition, handwritten numeral recognition, automatic text classification, multi-dimensional function prediction, etc., and has produced a lot of deformation algorithms.