2.4.1. The Improved Conditional Random Field (CRF) Model and Other Models
The CRF is a kind of probability model, which has been widely used in image segmentation, stereoscopic vision and activity analysis because of its ability to combine spatial information [
29]. In this paper, a method of water-quality classification based on the detail-preserving smoothing CRF was proposed, which used the probability of each class obtained by the RF classifier to define as the unary potential of the CRF, and defined the linear combination of the spatial smoothing term and the local class label cost term as the pairwise potential, so as to achieve the classification effect of combining spatial contextual information and retaining detailed information at the same time.
The CRF model have been developed with a unified probability framework to simulate local neighborhood interactions between random variables, where the posterior probability is expressed as a Gibbs distribution directly [
38]:
where
y is the observation data of the input image, that is, the pixel-by-pixel spectral vector;
x represents the class labels;
Z(y) is the partition function;
is the potential function, which models the spatial interaction of random variables locally based on the neighborhood system and clique
c in the image; and
C represents a fully connected subgraph. In this paper, 8-neighborhood model was applied in pairwise CRF framework.
Assuming an observation filed
, which
N is the total number of pixels, and a labeling field
. According to the posterior distribution of the label
x, given the observation
y, the corresponding Gibbs energy is shown in Equation (2):
In order to find the label image x which maximizes the posterior probability
, based on the Bayesian maximum posterior rule (MAP), the MAP label
of the random field is given:
When the posterior probability is maximum, the energy function is minimum. In Equation (3), , is the unary potential function, which represents the segmentation result under the premise of independent consideration of each pixel; is the pairwise potential function, which represents the influence of the relationship between pixels on segmentation. The nonnegative is the tuning parameter that represents the proportion of the pairwise potential. The larger , the more obvious the smoothing effect.
The unary potential function models the relationship between class label and pixel spectral data. The probability estimation of each pixel is calculated by discriminant classifier, and the feature vectors are given. It plays a leading role in the process of classification, and is generally the posterior probability of a supervised classifier. The unary potential function is defined as:
where
represents the feature vector at the position
i, which comes from the spectral dimension mapping of a pixel in an image.
is the probability of class label
taken by the pixel i based on the feature vector. Because the RF algorithm is stable, and the classification effect is good without parameter adjustment, the RF classifier was selected as the unary potential.
Based on the probability distribution results of the unary potential, the pairwise potential function models the label class relationship of the pixels in the neighborhood. The similarity between pairs of pixels is measured by the local features of the image, which affects the label class between pixels in the neighborhood, and reflects the interaction of points. In order to minimize the Gibbs energy of the corresponding model, if the feature difference between pixels is large, the pairwise potential function value should be small, that is, the labeling results should be accepted; If the feature difference between pixels is small, the pairwise potential function value should be large, and the labeling results should be modify by the model. The expression of the pairwise potential function is:
where
represents a smooth term related to data y,
.
is the Euclidean distance, and y is the spectral vector.
represents the cost between labels
and
in the neighborhood. The parameter
is applied to control the degree of the label cost term in pairwise potential function. The range of parameter
is usually [0–4]. The local class label cost term
is defined as:
where
is the label probability given by the RF classifier;
represents the spectral feature vector at the position
i; and
is the class label.
will affect the label estimation of the current pixel according to the probability distribution of adjacent pixels, so the model can smooth the classification results when considering the spatial contextual information.
As mentioned earlier, the local class label cost term is expressed as a probability estimate of the spatial distribution of category labels. Thus, the final classification accuracy depends on the accuracy of the probability estimate, which is obtained from majority voting of the original RF classification map. In order to effectively remove the salt-and-pepper classification noise, the label property of adjacent cells should be taken into account. Therefore, the maximum of all the class labels for each pixel will be the probability estimate of the segmentation result.
In summary, aiming at the classification of water quality in China’s water quality assessment system, a supervised classification method CRF combined spectral information with spatial contextual information was proposed in this paper. It takes the probability distribution of the RF classifier as the unary potential, and defines the linear combination of spatial smooth term and local label cost term as pairwise potential. The model can predict the label class of current pixel with reference to the water quality level of adjacent pixel. In addition, three pixel-based classifiers were added to the experiment.
Three other models were discussed in this paper, namely the based-pixel RF, DT and DNN. For the image classification problem, the RF is not the best-performing algorithm. However, due to its simplicity, ease of implementation, strong generalization ability, and good performance on many datasets, it has been widely used in academic research and industrial applications [
39,
40]. The RF is an algorithm that integrates multiple trees through the idea of ensemble learning. Its basic unit is the decision tree, so N trees have N classification results. RF integrates N voting results, and the class with the most votes is the final output. Recently, although the DT model is no longer a mainstream classifier for use alone, it is widely used as a base learner in more complex algorithms, because of its fast speed, high accuracy, and ease of understanding [
41,
42]. DT classification represents the process of classifying instances based on features. Based on the if-then rule, its classification speed is fast and it is a commonly used classifier. Since the number of samples is not uniform, the DT model automatically adjusts the weight based on the number of samples. LeCun et al. [
43] published an article on deep learning in
Nature in 2015, expressing the importance of the model to human society. The DNN is a pixel-based supervised learning model and the basis of other deep learning models. The ability of the neural networks to express models is dependent on the optimization algorithms. The optimizer selection will be described further later. The training process of a DNN consists of two parts: the forward propagation of the signal and the reverse propagation of the error. The back-propagation algorithm can optimize the weight and bias of the neural network according to the defined loss function, so that the loss value of the model reaches a smaller value. In this study, the algorithms were implemented in Python and TensorFlow.