In this section, the proposed method is respectively assessed on one private and ten publicly-available datasets of airborne lidar surveys. Its results are compared with those of some commonly used interpolators for lidar-derived DTM construction, such as IDW, OK, linear, bicubic and NN interpolation methods.
IDW is a type of deterministic method for multivariate interpolation with a known scattered set of points [
40]. To predict a value for any unmeasured location, IDW uses the measured values surrounding the prediction location. The measured values closest to the prediction location have more influence on the predicted value than those farther away [
41]. Thus, it gives greater weights to points closest to the prediction location. All points within a specified radius or a specified number of points are used to perform prediction. Here, we used the latter.
Kriging is an advanced geostatistical procedure that generates an estimated surface from a scattered set of points [
42]. Kriging is similar to IDW in that it weights the surrounding measured values to derive a prediction for an unmeasured location. In IDW, the weight depends solely on the distance to the prediction location. However, with the kriging method, the weights are based not only on the distance between the measured points and the prediction location, but also on the overall spatial arrangement of the measured points [
43]. To use the spatial arrangement in the weights, the spatial autocorrelation must be quantified with semivariogram. Thus, the important step in kriging is to accurately fit the experimental model to a theoretical variogram model. As one type of kriging, OK assumes that the variation in z-values is free of any structural component (drift) [
44].
Bicubic interpolation is the lowest order two-dimensional interpolation procedure which maintains the continuity of the function and its first derivatives across cell boundaries [
45]. It requires 16 values to derive the coefficients. The 16 values can be obtained from two schemes. The first one uses the elevations at the four vertices of the grid cell, together with three derivatives [
46], and the second one uses the elevations of sample points directly, without estimating the partial derivatives [
45]. The first scheme is commonly adopted to interpolate a grid-based DTM to higher spatial resolutions, since the three derivatives can be easily computed with the finite difference. However, for interpolating scattered sample points, we can only use the second one due to the difficult computation of the derivatives. Here, we used the second scheme because of the interpolation of randomly distributed sample points.
NN finds the closest subset of input samples to a query point and applies weights to them based on proportionate areas to interpolate a value [
47]. It is also known as Sibson or “area-stealing” interpolation. Its basic properties are that it’s local, using only a subset of samples that surround a query point, and interpolated heights are guaranteed to be within the range of the samples used. It does not infer trends and will not produce peaks, pits, ridges, or valleys that are not already represented by the input samples [
48]. The surface passes through the input samples and is smooth everywhere except at locations of the input samples.
In order to give a fair comparison to the proposed method with respect to interpolation accuracy and computing time, all aforementioned interpolation methods were performed in MATLAB, and their parameters were optimized with the ten-fold cross validation technique [
39]. For OK, we used the existing ‘kriging’ function to perform interpolation [
49], where the calculation of the experimental variogram and the fit of a theoretical variogram to the experimental variogram were respectively conducted with the functions of ‘variogram’ [
50] and ‘variogramfit’ [
51]. Since variogram fitting is very time-consuming, only five thousand points, randomly selected from the training dataset, were used to fit the variogram. The other interpolation methods can be easily coded in MATLAB.
4.2.1. Private Dataset
The study site with an area of 2 ha and the mean slope of about 20° is located in the valley of Alptal in Switzerland. It is characterized by a strong micro-relief with many ridges and depressions, and covered with trees and vegetation [
52]. The airborne LiDAR dataset was collected using a Riegl VZ-1000 mounted on a helicopter with the flight altitude and speed of 500 m above ground and 30 knots, respectively. The average pulse density was 16.9 pulses/m
2. Since the raw point cloud contains both ground and non-ground points, the lasground function in the software package of LAStools was used to automatically filter the point cloud. After the filtering process, 146,809 ground points with the density of 7.34 points/m
2 were obtained. Namely, the average point space was 0.37 m. The ground points were randomly separated into training and testing datasets with the proportions of 90% and 10%, which were used to generate DTMs and assess the interpolation accuracy, respectively.
Similar to the numerical test, the proposed method with different CSRBFs were used for DTM construction, and the performances of the proposed method with the RBF centers selected by the surface variation-based algorithm and by the random scheme were compared. Since data density and grid size have significant effects on the quality of DTMs [
11,
53], 100%, 50%, 30% and 10% of points were randomly extracted from the training dataset, and for each data density, DTMs with five different resolutions were produced by each interpolator, i.e., 0.3, 0.5, 1, 1.5 and 2 m. In total, 4 × 5 × 11 = 220 DTMs were generated. Note that our purpose was not to assess the absolute accuracy of DTMs because the testing points have a similar accuracy level to the training data. Instead, the testing points were employed to evaluate the performance of the interpolators for computing heights at locations without measured points in the training points. The flowchart of accuracy comparison between different interpolation methods is shown in
Figure 6.
It can be concluded that with 100% of training data, 146,809 × 0.9 = 132,128 points should be used to construct DTMs. Thus, the smoothing and exact RBFs require a memory cost of at least 130.1 GB, while the proposed method requires a cost of 263 MB when 13,128 RBF centers were selected and 1313 sample points were located in the support radius of each center. Due to the large dataset, the exact and smoothing RBFs cannot be carried out with the debug of ‘out of memory’ shown in MATLAB. Thus, the results of the two methods will not be shown in the example.
As shown in
Table 5, irrespective of DTM resolution and data density, the proposed method with the RBF centers selected by the surface variation-based algorithm generally outperforms that by the random method, especially for high-resolution DTM generation. For example, for CSRBF1 with data percentage of 100%, the accuracy increasing ratio of surface variation-based method to the random selection is 65.9% on 0.3-m-resolution, while that is only 4.7% on 2-m-resolution. CSRBF0 seems slightly more accurate than the other CSRBFs. This conclusion is inconsistent with that from the numerical test, which may be attributed to the fact that the surface of the study site is so coarse that the higher degree of smoothness assumed by some CSRBFs cannot be met.
Interpolation results (
Table 6) indicate that data density and spatial resolution have great influences on DTM accuracy. The accuracy of DTMs consistently increases with finer resolution and denser data points. For example, as the data percentage decreases from 100% to 10% with the resolution of 0.3 m, the RMSE of the proposed method changes from 4.86 to 9.21 cm with the increasing ratio of 89.5%, while the increasing ratios of IDW, OK, linear, NN and bicubic are 117.4%, 82.9%, 101.4%, 103.4% and 121.9%, respectively. Those demonstrate that the proposed method, which is on par with OK, is more stable to data density variation than the other methods for generating high-resolution DTMs. In comparison, the bicubic interpolation is the most sensitive to data density variation, which is closely followed by IDW. As spatial resolution ranges from 0.3 to 2 m with the data percentage of 100%, the RMSE of the proposed method varies from 4.86 to 17.63 m with the increasing ratio of 262.8%, and the increasing ratios of IDW, OK, linear, NN and bicubic are 206.1%, 288.9%, 273.9%, 280.6% and 226.7%, respectively. Thus, our method is more robust to the variable of DTM resolution than OK under high data density.
Figure 7 shows that the proposed method is obviously superior to the other methods with respective to average RMSE on the five resolutions, especially for the high data density, and IDW always has the lowest performance. This further indicates the high robustness of the proposed method to spatial resolution change.
RMSE as a global accuracy measure may give a biased assessment to the interpolation methods [
54]. Thus, shaded relief maps of all methods were generated. Results demonstrate that the surface of IDW is very coarse and has some fake depressions, such as those marked in the circle (
Figure 8a). OK keeps the terrain features very well (
Figure 8b), but has a relatively coarse appearance. Both linear (
Figure 8c) and NN (
Figure 8d) have a peak-cutting problem, producing many flat areas with an abnormal appearance. The bicubic surface is fragmented on some terrain features (
Figure 8e). By contrast, the proposed method obtains a good tradeoff between noise removal and subtle feature preservation (
Figure 8f).
The computational costs (
Table 7) show that IDW is the fastest method. This result can be expected, as IDW does not require the solution of a linear system. Linear, NN and Bicubic have similar computational costs. OK has the largest computational cost, because it requires the fitting of semivariogram and the solution of a linear system with a dense matrix. In comparison, the proposed method is faster than OK, and slower than the other methods. The superior performance of the proposed method to OK is attributed to its sparse interpolation matrix and fewer RBF centers than the sample points. On average, the proposed method is about 4 times faster than OK.
4.2.2. Public Datasets
Ten benchmark datasets with different terrain characteristics, provided by ISPRS Commission III [
55] were further employed to assess the performance of the proposed method and the aforementioned interpolation methods. Note that the proposed method with CSRBF0 and RBF centers selected by the surface variation-based algorithm was adopted due to the high interpolation accuracy shown on both the simulated and private datasets. Since the label of each point (i.e., ground or nonground) in the benchmark datasets are known, only the ground points were selected to generate DTMs. The flowchart in this example was similar to
Figure 6. The differences were as follows: (1) the sample points were not reduced due to the low sample density and (2) the spatial resolution was set to be 0.1 m to reduce the level of location error on DTM accuracy [
56]. The numbers of training points (#training) and testing points (#testing), area and data density for each sample were listed in
Table 8. Moreover, the standard deviation (STD) of elevations for describing terrain characteristic was also provided. We can see that the 10 samples with different areas have various terrain complexities.
Figure 9 demonstrates that irrespective of interpolation methods, terrain characteristics have a significant effect on the accuracy of DTMs. More specifically, as the complexity of the terrain increases, the quality of DTMs decreases. For example, regardless of interpolation methods, samp21 and samp31, less complex than the other samples, obtains much better results. Samp11 with the greatest terrain complexity has the worst result, which is followed by samp52. The proposed method consistently performs better than the other methods on all the samples, especially for the complex terrains, such as samp11, 41, 52 and 61. This indicates that the new method has the most robustness to terrain characteristics, which is followed by OK. IDW seems to rank lowest, as it has four samples with the lowest accuracy (i.e., samp41, 51, 52 and 71). Linear and bicubic interpolators rank second lowest, both of which have three samples (i.e., samp11, 12 and 31 for linear, and samp21, 22 and 61 for bicubic).
The shaded relief maps of all the interpolation methods on samp51 were generated (
Figure 10). The selection of samp51 is mainly due to its complex terrain characteristics with obvious breaklines and subtle terrain features. Results indicate that the IDW method has the coarsest surface, which is full of bumps, especially on the ramp (
Figure 10a). The linear method has obvious discontinuities due to the variable distribution of sample points (
Figure 10c). NN is much better than linear, yet the former suffers from the peak-cutting problem (
Figure 10d). The surface of bicubic interpolation is so smooth that almost all subtle terrain features are lost (
Figure 10e). OK seems to have a much better surface than the classical interpolation methods (
Figure 10b). However, it is subject to tree root-like discontinuities in the right-middle area. In comparison, the proposed method produces a visually appealing surface (
Figure 10f). Nevertheless, some subtle terrain features are blurred.
The computing costs of all the interpolation methods on the ten public datasets are much larger than those on the private dataset (
Table 7 and
Table 9). This is because the number of DTM grids on the former is much larger than that on the latter. For example, among the ten public datasets, samp21 with the smallest area has 1152 × 1238 DTM grids with the 0.1-m-resolution, while the private dataset with the area of 2 ha has only 723 × 535 grids with the resolution of 0.3 m. However, the ranks of the interpolators with respect to computing cost are unchanged. Namely, IDW is the fastest method, which is followed by NN, linear and bicubic. OK has the largest computing time. The proposed method is faster than OK, yet slower than the other methods. On average, the speed of the proposed method is about 4 times as high as that of OK.