3.1. Neighborhood Point Selection
The classification framework of the research application is improved from the single-wavelength LiDAR point cloud classification framework [
28]. In the classification framework, the selection of neighborhood points will affect the utilization of original information and the effect of feature extraction, so the selection method of neighborhood points is also the focus of research [
27,
28,
29,
30]. Commonly used neighborhood types include spherical neighborhood, cylindrical neighborhood, and K-nearest neighborhood. For a given point cloud P, the spherical neighborhood includes all point clouds whose distance from the P is less than the radius R. The cylindrical neighborhood includes all point clouds with a radius less than R from the P on the ground projection point. The K-nearest neighborhood includes the K point clouds closest to the P [
30]. According to traditional neighborhood types, such as cylindrical neighborhoods, Wang et al. [
29] improved them and achieved better performance in the classification of power lines in urban areas. These three types of neighborhoods all rely on empirical or heuristically given radius or K values, which are often different in different scenarios. To solve this problem, the idea of the adaptive neighborhood is proposed, including surface variation, dimensionality-based, and eigenentropy-based neighborhood point selection [
27,
31,
32].
Due to different scenarios, the optimal neighborhood scale is also different. Although the adaptive neighborhood can select the optimal neighborhood-scale based on local features, the computational cost and time cost of point-by-point search are very expensive. In addition, the neighborhood selection method proposed above is only feature extraction at a single-scale. As far as we know, there is no research on land cover classification using multi-scale neighborhood features on Titan multispectral LiDAR data [
25]. Therefore, the study uses multi-scale to extract spatial features and spectral features of multispectral LiDAR point cloud. Due to the uneven spatial density of the point cloud of Titan data, the use of spherical neighborhoods or cylindrical neighborhoods cannot guarantee the number of neighborhood points, and there are situations where spatial features cannot be extracted, so K-nearest neighborhoods are used for feature extraction. Considering that the larger the K value, the higher the computational cost of feature extraction, it is more inclined to use a smaller K value when selecting the size of the multi-scale neighborhood. Therefore, the study used four different K values of 20, 50, 100, and 150 to extract the spatial–spectral features.
3.3. Feature Selection Based on Equalization Optimizer
The construction of the feature set is empirical. In order to screen out the features suitable for Titan multispectral LiDAR point cloud data in land cover classification, it is necessary to construct a high-dimensional feature set. So in
Section 3.2, we obtained 160-dimensional high-dimensional spatial–spectral features. At the same time, the multi-scale neighborhood features constructed by the same definition must have redundancy, and feature selection is necessary.
Feature selection enhances the classification accuracy by selecting a subset of features, or reduces the dimension of feature sets and computational costs without reducing the classification accuracy of the classifier [
33]. Feature selection can be divided into filtering feature selection and wrapping feature selection [
34] according to evaluation criteria. Filtered feature selection is to evaluate features through a certain measurement index and retain a subset of features with good performance. The synergies between features are not taken into account in this process, and it is incomplete to consider only the representation of a single feature [
35]. Wrapper feature selection is a problem that takes the classification results as the evaluation criteria and transforms feature selection into search optimization. Most of the wrapped feature selection algorithms are developed based on optimization algorithms, but the traditional optimization algorithms cannot deal with complex problems. Therefore, meta-heuristic algorithm is developed in feature selection. The genetic algorithm (GA) [
36], particle swarm optimization (PSO) [
37], simulated annealing (SA) [
38] and ant colony optimization (ACO) [
39] are some of the most conventional meta-heuristic approaches. They all belong to different categories of meta-heuristics, and many researchers in different fields have evaluated their performance. The equalization optimizer-based algorithm is an optimization algorithm designed based on a physical approach and inspired by the control volume-mass balance model used to estimate dynamic and equilibrium states [
40]. Through the fitting test of 58 functions, the Equalization Optimizer algorithm is significantly better than the PSO, GA, Gray Wolf Optimizer (GWO), Gravity Search Algorithm (GSA), Salp Swarm Algorithm (SSA) and Covariance Matrix Adaptive Evolution Strategy algorithm (CMA-ES).
The basic theory of the Equalization Optimizer algorithm comes from the control volume mass balance model:
where
is the control volume,
is the concentration of particles in the control volume,
is the rate of change in mass in the control volume,
is the volumetric flow rate into and out of the control volume,
is the concentration of particles inside the control volume at an equilibrium state without generation, and
is the mass generation rate inside the control volume.
After deforming and integrating it, the basic formula for equalization optimizer can be obtained.
is the equilibrium pool, and the pool retains the four best individuals and one average individual selected during the iterative process.
is an exponential term, which controls and balances the weight of the Equalization Optimizer algorithm in the exploration and exploitation process.
represents the generation rate, which is used for the update rate of each individual concentration value. For the detailed calculation methods of
,
and
, refer to the research of Faramarzi et al. [
40]. After the items are introduced, the Formula (1) is expressed as Formula (2), where
is set to the unit value 1.
Since the original Equalization Optimizer algorithm is designed to solve the continuous optimization problem, the feature selection algorithm is often based on the binary method, that is, the feature is selected and removed, and the original Equalization Optimizer algorithm needs to be improved. The improvements made are in two aspects, including the calculation of fitness values and the selection strategy of features.
Based on the purpose of feature selection, that is, to reduce feature dimensions and improve classification accuracy, we refer to and improve the fitness function construction strategy adopted by Zhang et al. [
35] in the evolutionary algorithm, which can take into account classification accuracy and dimensionality reduction.
where
represents classification accuracy and
represents dimensionality reduction rate. When the feature dimension is small, the value of
ρ is close to 1, and the inspection of the fitness function is mainly based on classification accuracy. As the feature dimension increases, the value of
ρ decreases until it is close to 0.9, and the dimensionality is reduced. The importance has also increased. Therefore, under the new fitness function, the algorithm considers both the classification accuracy of the model and the reduction of feature dimensions, but the classification accuracy is still the most important.
where
is the number of samples correctly classified, and
is the total number of training samples.
represents the dimension of the feature subset after feature selection, and
represents the feature dimension.
The measurement of classification accuracy uses K-Nearest Neighbors (KNN). The classification principle of the K-nearest neighbor classifier is to consider that the samples with similar distances in the feature space have the same category. In order to avoid the influence of local outliers, K-nearest neighbor points are selected, and the center point category is selected as the category with the highest proportion among the neighbor points [
41]. The establishment of the classification model is “lazy”. With the import of testing data, the classification model is constantly being improved. The key parameters in the classification model are K value and distance measurement. The choice of K value is generally determined empirically. In order to reduce the classification time, the K value in this study is selected as 5. The distance metric is used to calculate the distance between sample features, describe the degree of similarity between each sample, and the commonly used Euclidean distance is used in the study.
and
are two samples, where
n is the number of features, and the distance between two samples is expressed as:
The improvement of the feature selection strategy is actually the binarization of the equilibrium concentration value C. In feature selection work, binary coding is often used to eliminate or select features, so here we will use a uniform random number between 0 and 1 to initialize the initial concentration value of each individual. We set the threshold to 0.5, and use the Formula (3) to convert continuous concentration values into discrete values, which can be used for binary coding for feature selection. Since the initialization value is random, the probability of each feature being selected and the probability of being eliminated are equal. After the Equalization Optimizer algorithm uses the optimal individual to update the concentration value of each individual, there will be a concentration value greater than 1 or less than 0. Although they have no effect on the feature selection process, this error will continue to accumulate in subsequent iterations. The equilibrium concentration values greater than 1 or less than 0 are corrected to 1 and 0 at the end of each iteration.
In terms of parameter settings, we set the number of iterations to 100 and the number of individuals to 100. Since there are only 8000 training samples, 5-fold cross-validation is used to obtain a relatively stable classification result of the KNN classifier.
3.4. Classification and Accuracy Evaluation
SVM is an efficient machine learning classification method. It obtains the support vector on the category boundary and divides the optimal boundary of the classification according to the support vector. At present, the robustness of SVM has been proven in text classification [
42], image classification [
43], biometric recognition [
44] and other fields, and it is also very commonly used in land cover classification [
45,
46,
47]. SVM does not have high requirements for training samples, and good classification results can be obtained by only using small samples for training. Point cloud category labeling is very difficult, and a large number of training samples are difficult to obtain. SVM classifier can adapt to the training set of small samples, so it is researched to apply SVM to the land cover classification of the Titan multispectral LiDAR point cloud. After testing, the polynomial kernel function was chosen, coef0 was set to 8, and gamma was set to −0.0625.
The classification result evaluation is used to measure the effectiveness of feature construction and the applicability of feature selection. The confusion matrix can provide the classification effect of each category in the classification process. This study uses the confusion matrix generated by the classification result and uses the confusion matrix to calculate producer’s accuracy (PA), user’s accuracy (UA), overall accuracy (OA), class average accuracy (AA) and kappa coefficient. Here,
i represents the category of ground objects, and
is the correct classification number of category
i.
and
is the number of category
i ground objects samples in the ground truth label and predicted label, respectively.
is the total number of samples and
is the number of categories.