1. Introduction
With the rapid development of MLS, the PCD obtained by MLS is widely used to express the 3D surface information of roadside objects [
1,
2]. The result of extracting trees individually and capturing the attributes of trees from MLS point clouds can be widely used in various applications, such as urban road planning, street tree 3D modeling [
3], street tree monitoring [
4], tree species identifying [
5], and biomass estimation [
6]. However, MLS point clouds of an outdoor scene are usually characterized by complex, diverse outdoor scene objects, and the different densities distribution of tree point cloud. Furthermore, trees usually are spatially overlapping with other non-tree objects (e.g., lamps, billboards) and tree crowns. These attributes pose significant challenges to detect individual trees from scanned outdoor scene.
In recent years, some automated methods based on MLS have been proposed [
7]. There are many scientific contributions aiming to segment scanned urban scenes into different objects [
8,
9,
10] and capture the attributes of trees [
11,
12,
13,
14,
15] (e.g., tree height, trunk diameter and diameter at breast height), and more outstanding work on 3-D object detection based on LiDAR data emerges [
16]. In this work, we focus on the current methods for individual trees detection from MLS. These methods can be roughly divided into three categories, i.e., the normalized cut methods (NCut) [
17,
18,
19,
20], the region growing methods [
21,
22,
23,
24], the clustering-based methods [
25,
26,
27,
28].
To improve the classification accuracy, Xu et al. [
18] spatially smoothed the semantic label results obtained by Random Forest classifier via a regularization process, and then extracted the individual trees based on NCut. However, NCut only considers the distance, resulting in inaccurate boundary segmentation of tree crowns. In addition, the prior knowledge of the number of trees cannot be obtained, over-segmentation and under-segmentation are prone to occur. Zhong et al. [
19] used the improved NCut to segment overlapping regions to obtain the individual trees. However, when there are poles near the trunk, the height threshold has a great influence on the trunk detection. The individual trees detection based on NCut needs to manually estimate the number of trees in a multi-tree cluster to determine the iteration termination condition. NCut requires large storage space and is inefficient when the PCD is dense, so NCut is mostly used for fine segmentation of under-segmentation overlapping objects.
Bonneau et al. [
23] divided the PCD into voxels and clustered the connected voxel units based on region growing. And then judged whether this was correctly segmented by analyzing the spatial range and eigenvalue ratio of the clustering units, to further refine the under-segmented clusters and merge the over-segmented clusters. However, this method requires complete tree structure information, and it will fail when the tree data is incomplete. Luo et al. [
24] proposed a deep network for semantic segmentation to extract tree points from raw point clouds. A pointwise direction embedding deep network (PDE-Net) is proposed to predict the direction vector of each tree cluster pointing to the tree center to improve the tree boundary segmentation accuracy. On this basis, a tree center detection method based on pointwise direction aggregation is proposed, and finally, extract individual trees based on the detected tree center as the seeds of the region growing. However, the direction prediction is inaccurate when the classification accuracy of tree points is low, and satisfactory extraction results cannot be obtained. A region growing-based method may not be able to obtain correct segmentation results due to improper selection of seeds or inaccurate feature extraction. Especially when trees are adjacent to some pole-like objects, it is difficult to separate trees and roadside pole-like objects, so there are major flaws in extracting individual trees from complex outdoor scenes.
Yang et al. [
25] extracted the treetop points by 3-D spatial distribution analysis and used the treetop points as the seeds of the k-means clustering to segment the individual trees. However, k-means clustering requires the number of clustering as an input parameter. Tao et al. [
26] intercepted PCD of a certain height and used DBSCAN clustering to obtain tree trunks. However, the trunks extraction result is unsatisfactory when the data density is uneven. Chen et al. [
28] extracted individual trees based on the Euclidean clustering. The Euclidean clustering does not require prior knowledge of the number of trees in the clustering process, the Euclidean distance of adjacent points needs to be compared with a user-defined threshold, which is difficult to set. When the threshold is small, tree points may be lost or over-segmented into multi-tree clusters. On the contrary, objects close to the tree cannot be separated. It is easy to cluster multiple connected trees together in complex outdoor scenes and produce over-segmentation when the tree data is missing. In addition, the clustering based methods also have certain limitations. For example, k-means based tree extraction requires the number of trees and the initial clustering position in advance. When the data is missing or the parameters are set incorrectly, the segmentation of the DBSCAN will be affected. Therefore, prior knowledge and parameter settings are very important factors when using the tree extraction method based on clustering.
Ncut has high time complexity when it dealing with complex scenes. Compared with NCut, region growth makes full use of the local features of point clouds for segmentation, but the segmentation effect depends on the growth criteria and seeds selection, and it is difficult to segment correctly when trees and pole-like objects are close to each other. Clustering-based methods can achieve better extraction results in simple scenes, but under-segmentation occurs when trees are densely distributed, and over-segmentation occurs when point cloud data is incomplete. In conclusion, for outdoor scenes with large tree spacing and small overlap between tree crowns and nearby objects, most existing methods can segment and extract individual trees well. However, in complex scenes where multiple trees are connected or trees are adjacent to other objects, the extraction results of an individual tree is unsatisfactory. In addition, the current method is also affected by the density of point cloud data, which affects the extraction results of individual trees when the point cloud data is missing or incomplete.
In this paper, to overcome the problem of low tree extraction accuracy caused by the uneven density, missing or incomplete of point clouds, we proposed a novel method which combined tree detection with multi-feature enhanced PointNet, treetop points detection and radius expansion, to achieve a coarse-to-fine individual trees extraction from MLS point clouds. The main contributions of the proposed method are as follows.
- (1)
A comprehensive framework combining semantic segmentation, treetop points locating, and radius expansion is constructed for individual trees extraction. It can accurately extract an individual tree and solve the over segmentation caused by incomplete point cloud data and uneven density.
- (2)
A tree detection method based on the semantic segmentation by multi-feature enhancement PointNet is proposed to solve the classification of multiple-categories objects in complex outdoor scenes.
- (3)
A novel individual trees extraction method is introduced for scanned urban scene. Through calculating the local maximum, the candidate treetop points are extracted. Taking the treetop points as center, the radius expansion guided method is presented for further extraction of an individual tree.
2. Materials and Methods
The proposed method mainly contains three steps: (1) tree detection based on the semantic segmentation by multi-feature enhancement PointNet, (2) optimal treetop points location based on projection, and (3) individual trees detection based on radius expansion. The overview of our proposed framework is shown in
Figure 1.
2.1. Tree Points Detection Based on Multi-Feature Enhanced PointNet Semantic Segmentation
Generally, there are various objects in the outdoor scene, such as trees, buildings, ground, poles, vehicles, and etc. Therefore, it is necessary to remove non-tree objects in the scene and extract the tree points before extracting an individual tree. With the development of deep learning, Qi et al. [
29] proposed a PointNet network that can directly process point cloud, which showed high accuracy and efficiency in semantic segmentation. We could detect trees from raw outdoor scene point cloud based on PointNet deep neural network. However, it only uses the Multilayer Perceptron (MLP) to increase the feature dimension when the PointNet model extracts local features and does not consider the neighborhood information of the point cloud, resulting in a poor description of the extracted local features. Therefore, the local features of the PCD are extracted and the coordinate values are combined to form feature vectors as the input of the PointNet network to perform semantic segmentation of complex outdoor point cloud scenes.
2.1.1. 3D Point Cloud Features Extraction
The performance of 3D PCD local features description depends on its local neighborhood information. At present, the selection of point cloud neighborhood data can be roughly divided into two methods, i.e., the k-nearest neighbor (KNN) search algorithm and the spherical local search algorithm. KNN method is a density-adaptive search algorithm that takes the k points closest to the query point as neighborhood points and can obtain a consistent number of neighborhood points in the case of uneven point cloud density, which is beneficial to improve data storage and calculation efficiency.
Given scanned scene data
, the k neighboring points of a point
be
. The normal vector estimation is implemented by a least-square plane fitting on the nearest neighbors, which is mainly based on Principal Components Analysis (PCA). Therefore, the local covariance matrix M of
is constructed as:
where
is the number of points in the point cloud,
is the centroid point of the PCD, which is calculated by
The eigenvalues are positive and ordered as
. The normal vector (
of point
can be determined by the eigenvector corresponding to
. Ning et al. [
30] applied the local features calculated by the covariance matrix to the machine learning classification algorithm for tree extraction and achieved good classification results. Based on this, we selected 6 features that have a strong description ability for outdoor scene PCD, namely linearity
, flatness
, divergence
, anisotropy
, characteristic entropy
, and curvature variation
[
31], these features can be calculated by Equation (2):
As we all know, the divergence, characteristic entropy, and curvature variation of trees are significantly higher than those of ground and buildings, while the linearity, flatness, and anisotropy of trees are lower than those of poles, buildings, and other objects. Therefore, the characteristics of different objects can be grasped more comprehensively and effectively through multi-feature fusion, and the discrimination between different objects can be improved.
2.1.2. PointNet Enhanced by Multi-Features
The disorder of the PCD makes the point cloud of different input orders get different high-dimensional features after passing through the MLP layer, which affects the feature extraction of the deep neural network. The rigid body invariance of the point cloud makes the spatial structure and shape information of the point cloud unaffected under different perspectives. Therefore, Qi et al. [
29] introduced a T-Net module and a symmetric function to reduce the influence of disordered point clouds on the segmentation results. The specific steps of semantic segmentation are as follows. The PointNet network architecture diagram is shown in
Figure 1.
The spatial coordinate features of N points are combined with local features in the data preparation stage and the input data is represented by a 9-D vector . To adapt to the new number of channels, change the T-Net (3) of the PointNet network to T-Net (9), and then multiply the original PCD by the 9 × 9 transformation matrix learned by T-Net (9) to get the aligned data. After data alignment, the information of each point is learned and extracted by the MLP layer shared by 2 layers, and an N × 64 matrix is obtained. Finally, the 64 × 64 feature space transformation matrix is predicted by T-Net (64), and the transformation matrix is applied to the N × 64 matrix to achieve feature alignment, and the aligned features are used as the local features of the point cloud.
Input an N × 64 matrix in the shared MLP, and map the data to 64-D, 128-D, and 1024-D in turn to obtain an N × 1024 matrix. Then, the maximum value of N data in each dimension is extracted through the max pooling operation to obtain the global features of the point cloud. The aligned N × 64 local features and 1 × 1024 global features are spliced through the fully connected layer to obtain an N × 1088 matrix. Then, the three-layer MLP is used to classify and output the data, and a matrix of N × m is obtained, where N is the number of point clouds, m is the number of categories, and finally the semantic segmentation task of the scene is realized.
2.1.3. Filtering and Optimization
There are noisy points in the tree PCD obtained by semantic segmentation, so a filtering algorithm needs to be used to remove them. We use the pass-through filtering algorithm and the statistical filtering algorithm in Point Cloud Library (PCL) [
32] to denoise the tree points. The statistical filter is mainly aimed at scattered noise points with a small amount of data. By calculating the average distance from each point to the adjacent points, and then comparing it with the given mean and variance, the noise points outside the range are eliminated. The pass-through filter can quickly remove a large number of outliers beyond the set range by determining the extent of the PCD on the X, Y, and Z axes. The comparison diagram of filtering is shown in
Figure 2.
2.2. Treetop Points Extraction
Treetop points is the local highest points of crowns, which could determine the number of trees in the tree points from scene. Since there are often certain gaps between trees in urban scenes, even if the tree canopy overlaps. As the elevation increases, the horizontal spacing between the treetop points of different trees will become larger and larger. For single-row trees, it can be found that the treetop points of street trees are mostly located on the vertical plane of the tree distribution direction. According to the distribution of trees in outdoor scenes, we proposed a novel method to extract treetop points through local coordinate system (LCS) establishment, projection, and local maxima calculation.
2.2.1. Projection Direction
The trees in the outdoor scene have the characteristics that the treetop points are always the highest points. To extract accurate treetops, it is necessary to project all the trees points especially for the single-row street trees. As we all known, the outline of the projected trees is approximate to an ellipse, and the treetops are mainly located on the long axis of the ellipse. Therefore, the LCS of trees can be constructed by PCA, i.e., , , and (corresponding to ) represents the x-axis, y-axis, and z-axis, respectively. Then the plane where the x-axis and z-axis are located is selected as the projection direction.
Assume that the tree PCD in outdoor scene is
. The centroid
of all the data in
is calculated by
, where
is the number of tree points and
,
. Then, the tree points are projected onto the XOZ plane, and the point set after projection is
, which is shown in
Figure 3.
We can calculate the coordinate of
by the Equation (3):
where
is the normal vector of the XOZ plane.
.
The LCS is established for the single row of outdoor trees data (shown in
Figure 4a,4b). We projected the tree points onto the XOY plane (shown in
Figure 4c) and the XOZ plane (shown in
Figure 4d) of the LCS, respectively. We can see that it is easier to extract the treetops of the tree by projecting onto the XOZ plane.
2.2.2. Optimal Treetop Points Extraction
The tree points data after projecting onto the XOZ plane could provide an easy way to obtain the most unobstructed treetop points and is a good representation of the shape of the tree canopy. Based on this, we propose an optimal treetop points extraction method by three steps: (1) local maxima calculation. (2) candidate treetop points locating. (3) optimal treetop points extraction.
- (1)
Local maxima calculation
For the projection points on the XOZ plane, it is necessary to extract local maxima from the projection points to reduce the extraction range of treetop points and improve computational efficiency. Firstly, for points and on the projected contour, the redundant data are removed. That is to say, if and , one of the points are kept. If and , remove the point . Then sort all the projected points in ascending order of x coordinate to get the point set . Next, is defined as the local maxima when and , and the above procedure are repeated to extract all local maxima.
Figure 5 displays the comparison of local maxima extraction results before and after filtering redundant data.
Figure 5a,b shows the raw PCD and the local maxima before filtering.
Figure 5c indicates the data after the redundant points are removed, and the further extracted local maxima are shown in
Figure 5d. It is worth noting that the local maxima obtained from the filtered scene are located on the outer contour position of the tree crown and more conducive to the extraction of subsequent treetop points.
- (2)
Candidate treetop points locating
Based on the local maxima, a critical step in treetop points extraction is to locate candidate treetop points. For one tree, the changing trend of the crown contour points is to expand outward from the treetop points. Therefore, we locate the candidate treetop points according to the variation of the z-coordinate of the PCD. First, all local maximums are sorted in ascending order of x, denoted as
.
Nm is the number of the points of local maximum. Then, the difference
DMi of point
on the
z-axis are calculated by Equation (4).
Theoretically, the treetop point generally has maximum value of x coordinate among all its neighborhood points. Therefore, we locate candidate treetop points by detecting those points where their
DM varies from positive to negative. As the
x-axis coordinate value continues to increase, there will be some randomly distributed noise points on the
z-axis coordinate. To reduce the influence of noise, the difference needs to be smoothed. Thus, a two-step-based method is developed to detect candidate treetop points, i.e., the noisy points are deleted by smooth filtering (step 1), and then judge the symbol of DM. In step 1, for point
we search its
k nearest neighboring points and calculate the average difference
and get the smoothed difference
. In step 2, the sign function is used to judge the positive and negative of smoothed difference of the point
mi:
If
, it means that the smooth difference will change from positive to negative, so the point
is regarded as a candidate treetop point. This procedure is repeated, and then the set of candidate treetop points is obtained as
(see
Figure 6a).
- (3)
Optimal treetop points extraction
The candidate treetop points obtained not only contain correct treetop points but also include some local extreme points with lower heights or redundant points with close distances between data. Thus, it is important to filter or remove those data that do not belong to the real treetop points. In our paper, two criterions are introduced. One is tree height and the other is distances between treetop points.
To begin with, the points that do not conform to the tree height are eliminated by judging whether the z-axis coordinate of each candidate treetop point is less than the height threshold
zth. We set the treetop point to be above 1/2 of the height of the entire tree scene, and calculate the height threshold
zth according to the Equation (6):
where
and
are the maximum z-coordinate and the minimum z-coordinate of tree PCD, respectively.
Then judged the distance between each pair of candidate treetop points. Those treetop points that are very close to each other are merged and optimized. Calculate the Euclidean distance between all candidate treetop points, and then sort the distances in ascending order. If the distance between the nearest pair of treetop points is less than the distance threshold
dth (value is 0.5 m), the current two treetop points are replaced with their center points. After that, the distance between the updated treetop points is recalculated and evaluated. The optimized treetop points (see
Figure 6b) are obtained until the distance between each two candidate treetop points are greater than
dth.
The front view and top view of the candidate treetop points extracted from
Figure 4d are shown in
Figure 6a. The optimal treetop points obtained after filtering and merging the candidate treetop points are shown in
Figure 6b.
2.3. Radius Expansion Based Individual Tree Extraction
The challenge task of individual tree extraction is instance-level separation for spatially overlapping tree points [
24]. After getting all the treetop points in the scene, we extract an individual tree in the outdoor scene based on the radius expansion.
The core steps of our proposed algorithm include initial clustering by analyzing the treetop points, initial bounding box and expansion circle determination, high-level layering for tree PCD and individual trees extraction by radius expansion.
Given the optimal treetop points
,
is the number of the treetop points. The initial clustering is carried out with treetop points as the center. The specific steps are as follows: First, establish a KD-tree (k-dimensional tree, KdTree) with the point
as the seed point. Then take the seed point as the center of the sphere and set the radius of the ball to
. Next, cluster the data points in the range of
IR with the seed points (
Figure 7) to get the initial cluster
. In addition, this process is executed iteratively, until the spherical neighborhood data of all treetop points are divided, and the clusters are obtained, as shown in
Figure 7.
The purpose of clustering is to get the initial position where the bounding box and the extended circle are located. According to the initial clustering results, the maximum
and minimum
of all points in the cluster
to form the initial boundary set
Boui,
. We calculate the radius
Ri and the center
of the extended circle where the cluster
is located according to Equations (7) and (8), respectively.
After that, the boundary set of all clusters is , the radius of the expansion circle is where the cluster is located, and the center of the circle is .
It is necessary to slice the tree PCD in the scene after obtaining the initial boundary, as shown in
Figure 8. Set the number of split layers to
Nl, then extract the maximum
z-axis coordinate
zmax and the minimum
z-axis coordinate
zmin from the tree points and calculate the height
Hl of each layer of data according to Equation (9).
The point set of each layer from top to bottom is , the number of points in the layer is . There are two cases to segment tree points:
(1) If point
in layer
is within the range of
of the cluster
in layer
, and the horizontal distance
from point
to the center
is less than the radius
, then point
belongs to the tree where cluster
is located, and point
is assigned to cluster
,
is calculated by Equation (10):
where
is the
-th (
) point of the layer
,
(
) is the center of the cluster
(
).
(2) If the point
does not belong to any cluster, the circular distance
corresponding to the point
and each extended circle is calculated by Equation (11), then sort
in ascending order, and assign point
to the cluster corresponding to the smallest distance,
where
(
) is the radius the extended circle of cluster
,
.
After processing the data of layer
according to the above two cases, update the radius and center of the circle according to Equations (7) and (8), and then continue to segment the data of layer
until all data processing is completed, that is, the extraction of individual trees is completed. A diagram of these two cases is shown in
Figure 9. In addition, the individual trees of scene 1 obtained by the layer-by-layer radius expansion method is displayed in
Figure 10.
4. Conclusions
In this paper, a new method is proposed for individual trees detection from MLS point clouds, which can be used in street tree 3D modeling, street tree monitoring, tree species identifying, and biomass estimation. Our method consists of (1) non-tree points removal and tree detection via multi-feature enhanced PointNet, (2) locating treetop points via filtering the tree point projection plane and optimized treetop points by a distance rule, and (3) after the initial clustering of treetop points and vertical layering of tree points, a top-down layer-by-layer segmentation based on radius expansion to realize the complete individual extraction of trees. The experimental results derived from the Oakland 3D Point Cloud dataset demonstrate that benefiting from the accuracy of scene semantic segmentation and the proposed method can effectively extract the individual trees. Compared with the other two methods, the proposed method can effectively avoid the influence of artificial roadside pole-like objects and the crown overlaps. Overall, the precision, recall and F-score of instance segmentation on the used datasets are 98.33%, 98.33% and 98.33%, respectively.
In future work, we will improve the robustness of the method to adapt to forests. Additional deep learning can also be explored with goal of improving tree classification accuracy. Meanwhile, the fusion of the orthophoto image and the LiDAR point clouds would provide a better way to greatly improve the efficiency and the accuracy of urban trees detection, especially for the larger scale urban scenes.