1. Introduction
The current forest area in China is 3.46 billion mu, ranking fifth in the world. The forest stock volume is 19.493 billion cubic meters, ranking sixth in the world. The area of artificial forests is 1.314 billion mu, ranking first in the world. The total carbon storage of forests and grasslands reaches 11.443 billion tons, ranking among the top in the world. The key to the intelligent development of forestry lies in how to quickly and accurately obtain three-dimensional and rich information about forest areas and tree structures and achieve precise mapping and monitoring of forest resources. Traditional manual surveys and measurement methods are inefficient and cannot meet the modern forestry’s demand for large-scale and efficient acquisition of three-dimensional structural data.
To replace time-consuming and labor-intensive manual surveys with poor accuracy, commonly used methods for forest environmental surveys include hyperspectral, image, and point cloud. LiDAR-derived point cloud data record the three-dimensional shape information of targets. By conducting non-contact remote scanning of the entire forest area, high-density three-dimensional point cloud data of individual trees and even the entire forest area can be obtained, which fully captures the three-dimensional information of tree crowns, trunk shapes, branch structures, and more. Point cloud data can accurately reflect the three-dimensional scenes of forests and be used to construct and present digital forest systems. These rich three-dimensional point cloud data can be used to build detailed digital forest structure models, facilitating the extraction of parameters such as tree species, tree height, diameter at breast height, crown width, etc., from point clouds. This enables accurate measurement and dynamic monitoring of the spatial distribution and growth status of trees, greatly improving the accuracy of understanding forest resources in the area. Furthermore, point cloud data from different periods can be compared to monitor dynamic changes such as forest growth and logging processes and assess the increase or decrease in forest resources. Additionally, using point cloud data to construct detailed digital forests can simulate various forestry operations, such as transportation planning and logging operations, thereby improving the efficiency of forestry operations. In summary, compared to manual measurement methods, three-dimensional LiDAR scanning can cover a larger range and obtain more comprehensive three-dimensional structural data. It is not limited by terrain and environment, making it more efficient and convenient to operate.
Since its inception in the 1960s, laser technology has undergone significant advancements, particularly in terms of measurement accuracy. Consequently, its applications have proliferated across diverse sectors, including the military, industrial manufacturing, civil engineering, agriculture, and forestry. Within the realm of forestry, airborne light detection and ranging (LiDAR) technology has reached a state of maturity, having been employed as early as the 1980s. Pioneering work by Nelson, Ross [
1], and colleagues demonstrated the utility of airborne LiDAR for measuring vertical forest features, such as tree height and ground distance. Their findings indicated that the margin of error for tree height measurements was less than one meter when compared to photogrammetric techniques. Schreier [
2] utilized airborne LiDAR technology to scan forested areas and demonstrating that laser-generated point clouds could precisely differentiate between the ground and various types of vegetation. Furthermore, the technology is capable of distinguishing between coniferous and broadleaf forests based on metrics such as distance information, reflectance, and other parameters. Since the 1990s, airborne LiDAR technology has evolved, garnering increasing interest from researchers in forestry. The technology has been deployed to obtain comprehensive data on various aspects of forests, including growth factors, ecological conditions, vertical structure, and biomass over expansive areas. In contrast to airborne and satellite-based LiDAR systems, vehicle-mounted LiDAR offers the advantage of capturing more granular data on forest stand structures, owing to its high-density and high-precision point cloud capabilities. In recent years, research efforts have increasingly pivoted towards high-precision forest modeling and targeted identification. For example, Merlijn Simonse et al. explored the utility of vehicle-mounted LiDAR for forest resource surveys, extracting key parameters like stand positions and diameters at breast height from 3D point cloud data, with a particular emphasis on data-processing methodologies [
3]. Initially, the researchers employed Z-coordinates to identify the lowest points across various horizontal planes, thereby establishing a digital ground model. Subsequently, they applied the Hough transform technique to filter the point cloud data, allowing them to accurately pinpoint stumpage positions and their respective diameters at breast height. In conclusion, the study offers prospective insights into the broader applications of LiDAR technology for comprehensive extraction of stumpage parameters, such as tree species identification, tree height, canopy area, and wood defects, among others. Chris Hopkinson [
4] employed a synergistic approach, utilizing both ground-based and airborne LiDAR systems to scan tree canopies. Subsequent calibration calculations were executed with a high level of accuracy, enabling the derivation of both leaf area index and leaf profile models within a 1 km range. In a separate study, Aleksey Golovinskiy [
5] focused on target identification in urban settings by employing a shape-feature approach. The researchers initially captured 3D point cloud data within the urban landscape and employed clustering techniques for point cloud segmentation. This enabled the differentiation of foreground and background entities. Following this, shape features were extracted, and labeled data were used for training. The final step involved the application of support vector machines for robust target recognition. In Germany, Bienert et al. [
6] utilized vehicle-mounted 3D LiDAR technology to estimate the volume of trees. Initially, a rudimentary method was employed in which the point cloud was conceptualized as a stereo pixel structure. Tree volume was then calculated based on the number of filled stereo pixels. However, this approach led to overestimation issues. Consequently, refinements were made to address these inaccuracies, resulting in a more reliable method. In another study, Fabrice Monnier et al. [
7] explored the identification of urban street trees using 3D point clouds. The team defined various tree features, including volume, linear, cylindrical, and planar characteristics. Each feature was independently recognized. A probabilistic relaxation method was then implemented to filter out noise and insignificant point cloud structures. The aggregated data indicated that, even in complex urban settings, trees could be effectively identified using individual features. The aggregated data corroborated that in complex urban landscapes, tree identification could be effectively accomplished using individual features. In a separate study, Rutzinger et al. [
8] focused on tree recognition using 3D point clouds and constructed three-dimensional models for standing trees. They initially employed clustering algorithms to segment the point cloud data, isolating points that were situated at least 0.5 m above the ground. Subsequently, point density for each cluster was calculated. It was observed that the density of point clouds in the tree crown was substantially lower than in other areas, enabling effective tree recognition. The images in the manuscript reveal that the experiments were conducted on an urban road with sparse tree coverage, achieving an 85% recognition rate under these conditions. Hyyti et al. [
9] employed a 2D laser scanner, rotated it around a baseline to enable 3D scanning, and generated 3D point cloud data of a forest area. Utilizing circular arc features, they successfully identified tree trunks and estimated their positions. However, their method exhibited lower accuracy in calculating trunk diameters, resulting in errors of less than 4 cm within an 8 m scanning radius. In a similar vein, Pyare Pueschel et al. [
10] utilized a FARO 3D laser scanner to scan a forest area, from which they extracted tree locations, diameters, and wood volumes. Their study also evaluated the impact of varying scanning modes and curve-fitting methods on the accuracy of diameter and volume calculations, taking into account the effect of trunk occlusion. Sandeep Gupta et al. [
11] focused on 3D modeling of the vertical structure of tree canopies and individual trees. Utilizing airborne LiDAR, they acquired point cloud data, performed statistical analyses on the height distribution, and stored the data in an octree structure for further processing. Yangyan Li et al. [
12] extended point cloud studies to the domain of plant growth analysis. By capturing 3D point cloud data over an extended time frame—referred to as 4D point clouds—they examined plant growth patterns. Although numerous methods currently exist for extracting forest parameters from laser point clouds, both domestically and internationally, they encounter challenges that hinder the broader application of LiDAR in practical forestry management.
Due to the complexity of the forest environment and the large amount of point cloud data, traditional post-processing methods require manual intervention and multiple steps, which cannot serve as the technical foundation for real-time operations and rapid surveys. Deep learning, with its powerful learning ability, has great potential in handling forest environment point clouds. Currently, deep learning has five development directions in point cloud recognition:
Volumetric-based methods: typically involve voxelizing point clouds into 3D grids and applying 3D convolutional neural networks. However, the computational requirements and sparsity of stereo data after rasterization hinder the development of this approach. Some proposed solutions, such as CNNS [
13], FPNN [
14], and Vote3D [
15], have attempted to address these challenges, but difficulties still arise when dealing with large amounts of point cloud data.
Multi-view CNNs methods: attempt to transform 3D point clouds or shapes into 2D images and utilize 2D convolutional networks for classification. While this method achieved good recognition results at the time, it is difficult to extend to large scenes and 3D tasks, such as point cloud classification. Furthermore, 2D multi-view images only approximate 3D scenes and do not provide a true and lossless representation of the geometric structure, resulting in less ideal results in complex tasks.
Pointwise MLP methods: Utilize multiple shared multi-layer perceptrons to independently model each point and then aggregate global features using symmetric aggregation functions. These methods have made significant advancements in recent years. PointNet [
16], introduced by researchers from Stanford University in 2017, directly processes unordered point clouds as input data for recognition and semantic segmentation tasks. PointNet++ [
17], an extension of PointNet, addresses the limitation of extracting local information by utilizing Farthest Point Sampling (FPS) and Multi-Scale Grouping (MSG). Other methods, such as POINTWEB [
18] and PointSIFT [
19], focus on extracting contextual features from the local neighborhood of point clouds and incorporating the concept of the SIFT algorithm for point convolution.
3D convolutional kernels methods: Compared to convolutional kernels defined on a two-dimensional grid structure, designing convolutional kernels for three-dimensional point clouds poses greater challenges due to their irregularity. Current three-dimensional convolution methods can be categorized into continuous convolution and discrete convolution, depending on the type of convolutional kernel used. PointCNN [
20] addresses the difficulty of applying convolutional operations to irregular and unordered point cloud data by employing point convolution. Flownet3D [
21] extracts features and computes their correlation. Other methods, such as Spherical Convolutional Spectral CNN [
22], KPCONV [
23], and PointConv [
24], propose novel techniques for modeling the geometric relationship between neighboring points and performing convolution operations.
Graph-based methods: SuperPointGraph [
25], GCNN [
26]-utilize graph convolution for the effective processing of point clouds. ClusterNet [
27] generates rotation-invariant features and constructs a hierarchical structure of point clouds using an unsupervised approach.
The advancement of deep learning methodologies has shown promise in improving the efficiency of topographic LiDAR technology for forestry applications. These methodologies offer various approaches for point cloud recognition, each with its own strengths and limitations. Further research and development in this field will contribute to the integration of laser measurement technologies into operational forestry practices. However, in practical applications, traditional direct annotation methods are time-consuming and inefficient due to the complex terrain, foliage occlusion, and uneven sparsity of forestry point clouds. Meanwhile, due to the unstructured characteristics of the vast forestry point cloud scenes, such as occlusion and sparsity, directly applying deep learning frameworks to forestry point clouds results in low accuracy and poor performance. Therefore, a precisely annotated forestry point cloud dataset is proposed, and a point cloud semantic segmentation method suitable for forestry environments is established, which is of great significance.
To address these challenges and develop a more effective training model tailored to forestry scenarios, we conducted an in-depth investigation of forest landscapes. Leveraging multi-sensor fusion LiDAR technology, we collected high-quality data and performed precise semantic annotations, resulting in the creation of the “DMM” dataset.
Recognizing that forestry point cloud data are often highly obscured and present other issues, we engineered a feature extraction DMM module. This module is specifically designed to optimize the extraction of features from forestry point clouds. Additionally, we developed a semantic geometric segmentation algorithm that categorizes point clouds based on shared features. As a result, we propose an end-to-end point cloud processing framework called pointDMM.
2. Methods and Materials
In this section, we outline the annotation framework of our data acquisition system and introduce the DMM dataset for forestry scenes. Additionally, we discuss the DMM module, which is specifically designed for pre-segmenting multi-feature point clouds, and present pointDMM, a method for segmenting depth point clouds in garden scenes.
2.1. Study Area
Data collection was conducted at two distinct locations. As shown in
Figure 1, the first location, Gao Yang County Forestry District, is situated at coordinates 115°38′ E and 38°37′ N. It experiences an average annual rainfall of 515.2 mm. The second location, Beijing Dongsheng Bajia Park, is located at No. 5 Shuangqing Road, Dongsheng Township, Haidian District, Beijing, China. Its coordinates are N: 40°01′4.78″ and E: 116°20′40.63″. This park, the largest of its kind in Beijing, is located within the temperate monsoon zone. It has an average annual rainfall of 688.26 mm and an average annual temperature of 13.1 °C. The park spans approximately 615.83 hectares and boasts a rich diversity of plant species, including 21,700 trees with significant crown and flower coverage. The green area coverage exceeds 90%. In this study, we focus on a plantation forest spanning about 20 mu, primarily composed of Tsubaki and Populus species.
2.2. Data Collection Platform
We begin this section by providing details of the laser scanner used for collecting point cloud data. As shown in
Figure 2, Our backpack-type acquisition system utilizes the RS-LiDAR-16, developed by Shenzhen Suteng Juchuang Technology Co., Ltd. (Shenzhen, China). This state-of-the-art LiDAR unit is designed for applications in autonomous vehicle environment perception, robotics, and UAV mapping. The RS-LiDAR-16 employs a hybrid solid-state LiDAR approach, integrating 16 laser transceiver components capable of measuring distances up to 150 m with an accuracy of ±2 cm. The unit produces up to 300,000 data points per second, offering a horizontal measuring angle of 360° and a vertical angle ranging from −15° to 15°. The device performs exceptionally well under adverse visibility conditions such as sandstorms, haze, rain, or dense vegetation, thanks to RIEGL’s unique LiDAR technology. During our fieldwork, we collected a total of 197 point clouds over an area of 8.1 hectares. These point clouds were stored in LAS1.4 format. Given the large amount of data, I have organized all the experimental results mentioned in this document. The point cloud of the intermediate steps in the experiment has a total size of 8.8 GB and has been uploaded to Quark Cloud Drive. Please see
https://pan.quark.cn/s/fdce3d6aedac (accessed on 30 October 2023) (
Figure 2).
The SLAM technology of LiDAR has made significant progress in recent years, both in theoretical research and practical applications. With the advancement of sensor technology and computing power, we have adopted SLAM technology for point cloud collection in forestry environments. SLAM technology can work stably in complex and changing forest environments without the need for prior environmental information or GPS signals. It offers cost-effectiveness compared to traditional aerial or satellite remote sensing as SLAM-based ground point cloud data collection has lower costs and can be updated more frequently. Additionally, SLAM systems can be installed on various mobile platforms, such as drones, mobile robots, or handheld devices, providing flexible data collection solutions for different forestry applications. Furthermore, SLAM technology allows researchers and forestry workers to view point cloud data in real time, enabling them to make timely decisions and adjustments. Minimal environmental impact: Compared to traditional forestry measurement methods, SLAM-based point cloud data collection has a lower impact on the environment and is more environmentally friendly. Furthermore, SLAM technology enables data collection at various scales, ranging from individual trees to entire forests, providing abundant resources for forestry research and management. The application of SLAM technology in forestry point cloud data collection presents both new opportunities and challenges for forestry research and management. In this article, the Roboscene lidar-fusion inertial navigation backpack SLAM collection system and the Lioslam [
28] algorithm are utilized for data collection. By tightly coupling the lidar and inertial measurement unit (IMU) odometry system, the system achieves high-precision, real-time trajectory estimation, and map construction through smoothing and mapping techniques. The system builds a laser inertial odometry based on a factor graph, allowing the integration of relative and absolute measurement data from multiple sources, including loop closure detection. To remove distortion from point clouds, the system utilizes pre-integrated data from the IMU and provides initial estimation for laser odometry optimization. Through scan matching, selective keyframe introduction, and an efficient sliding window strategy within a local range, LIO-SLAM ensures real-time performance in forestry applications while maintaining high-precision trajectory estimation and map construction capabilities in forest environments.
The implementation of the Lioslam algorithm consists of several steps. Firstly, preprocessing is performed to correct the distortion of the input point cloud using IMU data. The point cloud data are then segmented into ground and non-ground points. Next, visual odometry is estimated by aligning the current frame with the previous frame’s point cloud data to determine the relative transformation between the two frames. The initial estimate provided by the IMU data assists in this alignment process. Subsequently, factor graph optimization is conducted by adding the laser odometry and IMU pre-integration data as factors to the factor graph. When loop closure is detected, loop closure constraints are added to the factor graph. Optimization algorithms, such as GTSAM or g2o, are employed to optimize the factor graph and obtain a globally consistent trajectory and map. The fused point clouds are then combined with the optimized trajectory to construct a global map. Optionally, map sparsification or downsampling can be performed to enhance efficiency. Loop closure detection is achieved by utilizing point cloud descriptors (FPFH) to match the current frame with historical frames and identify loop closures. If a loop closure is detected, loop closure constraints are added to the factor graph. Finally, the optimized trajectory and 3D map are outputted, enabling further path planning or navigation.
Overall, the utilization of SLAM technology in forestry point cloud data collection offers numerous benefits, including minimal environmental impact, multi-scale data collection capabilities, and improved precision in trajectory estimation and map construction. The Lioslam algorithm, along with the ROboscene lidar-fusion inertial navigation backpack SLAM collection system, provides a comprehensive solution for efficient and accurate forestry research and management (As shown in
Table 1).
To facilitate the training of deep learning models, we have organized the acquired data into three distinct, scaled datasets: DMM-1, DMM-2, and DMM-3. Each dataset is tailored to different scene scales, providing a comprehensive range of forestry environments. Specifically, DMM-1 focuses on individual trees, enabling accurate segmentation at a fine-grained level. DMM-2 emphasizes the semantic segmentation of local tree populations, allowing for a more detailed analysis of localized forestry environments. Finally, DMM-3 targets large-scale, multi-tree scenarios, enabling the assessment of expansive forest landscapes. For instance, DMM-1 serves as the ideal testing ground for single-tree segmentation. Meanwhile, DMM-2 and DMM-3 provide valuable insights into the semantic segmentation of localized forestry environments and expansive forest landscapes, respectively. Overall, these datasets offer a valuable resource for researchers and practitioners in the field of deep learning, facilitating the development and evaluation of advanced models for forestry analysis. The original point cloud can be seen
Figure 3.
2.3. Tree-Based Localization-Based Forestry Point Cloud Dataset Annotation Method
There is currently a lack of an accurate annotated dataset for multi-forestry identification and segmentation based on the mobile information collection platform for the forestry environment. The forest rapid mobile measurement and collection platform faces difficulties in annotating large-scale, different density, and unstructured point clouds. The annotation process is inefficient and results in low accuracy. Additionally, there are challenges with sparse and occluded trees and other objects, as well as the undefined fractal structure, which can lead to annotation errors. The existing point cloud data for processing complex forestry environment information are insufficient, as it lacks data with severe occlusion, high density, complex terrain, multiple return information, and uneven scale. To address these issues, we propose a method for annotating forestry large-scale scene data based on single-tree positioning. Compared to commonly used outdoor datasets such as semantic3D, the forestry point cloud dataset has its own characteristics. Semantic3D is currently the largest and most popular static dataset, where each frame is measured from a fixed position using a ground-based LiDAR scanner. The main categories in this dataset are ground, vegetation, and buildings, with fewer moving objects. It includes 3D semantic scenes from rural and urban areas, with three distinct suburban categories. The proportions of each category also vary.
Due to the forestry environment, our dataset mainly consists of live standing trees without any buildings or pedestrian information. Therefore, our point cloud annotation method is based on single-tree localization and pre-segmentation. The method includes the following steps: loading point cloud labels, denoising and filtering normalization of the point cloud, calculating DBH (diameter at breast height) and CHM (canopy height model), pre-segmentation of individual trees in the point cloud, and fine annotation based on pre-segmentation. As shown in
Figure 4,the specific steps are as follows:
1. Use the Semantic-Segmentation-Editor to load point cloud labels: It is an open-source web-based semantic object annotation editor called Semantic Segmentation Editor, developed by Hitachi Automotive And Industry Lab. This tool is specifically designed for creating training data for machine learning semantic segmentation in the field of autonomous driving research and development. However, it can also be used to annotate other types of semantic object databases. It supports 3D point clouds generated by LIDAR (in .pcd format). This article provides instructions for installing meteor and configuring the environment on Ubuntu 18.04. Its address is
https://github.com/GerasymenkoS/semantic-segmentation-editor (accessed on 11 January 2023.).
2. Denoising: The denoising of point clouds is a critical preprocessing step aimed at eliminating noise and outliers from 3D point cloud data. Point cloud data often suffer from noise interference due to limitations in 3D scanning devices or imperfect image reconstruction techniques. The objective of denoising methods is to restore the true structure of the point cloud while preserving its essential features. This process involves analyzing each point in the point cloud to determine if it is noise and adjusting or removing it based on the attributes of its neighboring points. Efficient point cloud denoising not only enhances the accuracy of subsequent tasks, such as 3D reconstruction, classification, and recognition, but also enables the identification of outlier points based on the distribution of distances between each point and its neighbors. The denoising algorithm follows the following steps: Establish k-nearest neighbors for each point in the point cloud: for each point P in the point cloud, find its k nearest neighbors, where k is a predefined parameter; Calculate the average distance and standard deviation: for point P and its neighbors, calculate the average distance μ and standard deviation σ from P to its neighbors; define a threshold: T = μ + α * σ, where α is a predefined coefficient used to control the strictness of denoising; remove outlier points: for point P and its neighboring points, if the distance from P to any neighbor is greater than the threshold T, consider P as an outlier point and remove it from the point cloud; iterative optimization: repeat the above steps until the number of outlier points in the point cloud is less than a predefined threshold or reaches the maximum iteration count; output the denoised point cloud. By following these steps, the denoising algorithm effectively removes noise and outlier points, resulting in a denoised point cloud that accurately represents the true structure of the data.
3. Normalization: The objective of point cloud normalization is to standardize the scale, position, or orientation of point cloud data with respect to a reference framework. Presented here is a step-by-step implementation of a straightforward point cloud normalization algorithm: Calculate the centroid: iterate through all points in the point cloud and compute the average x, y, and z coordinates to determine the centroid (x, y, z). Translate to the origin: for each point (x, y, z) in the point cloud, translate it such that the centroid is positioned at the origin; calculate the scale factor: compute the maximum distance, Dmax, from all points in the point cloud to the origin. Define a desired normalized radius, R. Calculate the scale factor as scale = R/Dmax; scale normalization: scale each point in the point cloud using the calculated scale factor; direction normalization: if necessary, ascertain the principal direction of the point cloud using methods such as principal component analysis (PCA). Rotate the point cloud to align its principal direction with a predefined direction, such as the z-axis; output the normalized point cloud; this algorithm initially relocates the centroid of the point cloud to the origin and subsequently scales it based on the maximum scale of the point cloud, ensuring that it falls within a standardized range. Additionally, if required, the direction of the point cloud can be adjusted. This normalization method establishes a unified reference framework for subsequent point cloud processing and analysis.
4. DBH: Diameter at breast height (DBH) refers to the diameter of a tree measured at a height of 1.3 m (or 4.5 feet) above the ground. This measurement is widely used in forestry and ecology to estimate the age, health, and growth rate of trees. DBH is a crucial parameter for assessing forest resources, calculating timber yield, and making informed forest management decisions. Typically, a tape measure or a specialized DBH measuring tape is utilized to obtain this value. By regularly measuring and recording the DBH of trees, researchers and forestry managers can monitor tree growth, health, and the overall condition of forest ecosystems. Single-tree segmentation based on seed points can be employed to derive parameters such as tree height, breast diameter, and crown diameter through the use of a single-tree segmentation algorithm. The breast diameter calculation method involves selecting point cloud data at the breast diameter position of an individual tree, fitting the DBH, and calculating the diameter of the fitted circle to determine the breast diameter of the tree. The crown diameter can be obtained by measuring the crown area and using the area measurement formula S = πr2 to calculate the crown diameter (2r). The following outlines the point cloud DBH estimation algorithm: Data preprocessing: apply a point cloud denoising algorithm to remove noise. Utilize a ground segmentation algorithm to separate ground points from non-ground points (such as trees and other objects). Locate breast height position: identify the highest point on the z-axis of the ground. Add 1.3 m to this height to determine the breast height position. Extract point cloud slice at breast height: take a small range (to evaluate based on point cloud density) above and below the breast height position and extract all points within this range to form a point cloud slice. Calculate the convex hull of the slice: employ a 2D convex hull algorithm to determine the convex contour of the tree trunk on the breast height slice. Calculate DBH: measure the maximum or average diameter of the convex hull, which will be the estimated DBH value. Optimization and calibration: if multiple trees or other objects interfere, clustering algorithms can be utilized to separate different objects and calculate DBH separately. Known reference objects can be used for scale calibration and to output accurate DBH values. In summary, DBH measurement at breast height plays a crucial role in assessing tree characteristics and making informed decisions in forest management. The point cloud DBH estimation algorithm provides a reliable method for accurately determining DBH values, contributing to the overall understanding and preservation of forest ecosystems.
5. CHM: CHM, short for canopy height model, is a two-dimensional data model that represents the height of the vegetation canopy on the ground. It is calculated by measuring the direct distance from the ground to the top of the vegetation. CHM is typically derived from remote sensing data, such as LiDAR or SAR, obtained from aerial or ground-based platforms. The CHM provides researchers with an intuitive way to observe and analyze the structure and height distribution of forests or other vegetation. It has wide applications in ecology, forestry, and environmental science, including biomass estimation, carbon storage, tree growth, and forest health monitoring. The process of generating a CHM involves several steps. The relative height for each point is calculated by subtracting the height of the corresponding DEM position from the Z value of the point. The canopy height model (CHM) is then generated by finding the point with the maximum relative height in each grid cell and assigning this maximum value to the corresponding grid cell in the CHM. smoothing techniques such as Gaussian filtering or other filtering methods can be applied. Finally, the CHM is outputted. Additionally, algorithm processing can be used to obtain data on the number, position, height, and crown width of individual trees. Overall, the CHM plays a crucial role in studying and understanding vegetation characteristics, and its applications are diverse in various scientific fields.
6. Fine-grained annotation: Pointly tree segmentation labeling. Leveraging state-of-the-art machine learning algorithms, the software effectively classifies and segments point cloud data, facilitating a quicker and more accurate understanding of the data. Pointly boasts an intuitive user interface that allows even those with limited experience in 3D data processing to quickly start. Furthermore, it supports various point cloud data formats and offers a wide range of data export and sharing functions, streamlining collaboration with team members and stakeholders. The resulting point cloud annotation can be seen in
Figure 5, generating a comprehensive DMM dataset with labels for trees, low shrubs, land, and other categories.
As shown in
Table 2, SemanticKitti [
29] introduces a large dataset to promote lidar-based semantic segmentation research. This dataset is annotated for all sequences of the KITTI Vision Odometry Benchmark, providing dense point annotations of the full field of view. Semantic3D [
30] contains over 4 billion points and includes a variety of urban scenes, such as churches, streets, railway tracks, squares, villages, football fields, and castles. It provides detailed point cloud data scanned with state-of-the-art devices and includes eight category labels. In contrast, our dataset was collected using 16-line lidar in over 1259 scenes using a backpack-style laser LiDAR. It contains 2144 million points and is divided into tall trees, low shrubs, land, and other categories. Our dataset is further divided into three parts: DMM-1, DMM-2, and DMM-3. DMM-1 represents the point cloud of a single tree, and the original point cloud is shown in
Figure 3, while the labeled label is shown in
Figure 5. DMM-2 represents the point cloud of multiple trees in a small range, and the original point cloud is shown in
Figure 3, with the labeled label shown in
Figure 5. DMM-3 represents the point cloud of multiple trees in a large range, displayed by the cutting method, and the original point cloud is shown in
Figure 3, with the labeled label shown in
Figure 5.
2.4. Energy Splitting DMM Module
In the DMM module structure shown in
Figure 6, the first step involves computing geometric operators for forestry feature point clouds. To accurately describe the characteristics of forestry point clouds, we utilize different point cloud feature descriptors compared to previous approaches. Since forestry point clouds do not include buildings or utility poles, we focus on linear feature descriptors, planar feature descriptors, and scattering feature descriptors, while excluding vertical feature descriptors. To characterize the point cloud, we employ a geometric approach for the point cloud descriptor. Equation (1) represents the linear geometric characteristics
of the point cloud, while Equation (2) measures
, the extent of linear stretching and elongation within the point cloud’s neighborhood. Equation (3) evaluates
, the flatness of the point cloud, indicating its conformity to a plane, and considers scattering and the description of the divergence characteristics of point clouds. These three features collectively capture the dimensional properties of the point cloud.
We introduced the forestry feature descriptor ‘R’ to delineate tree characteristics [
36]. For each point in the point cloud, calculate their linear characteristics
, plant characteristics
, and scattering characteristics
. As shown in the table below, calculate the forestry characteristic factor R and the terrain characteristic factor r for algorithm DMM. Trees exhibit linear aggregation in their branches and trunks, disperse needle-like leaves, and exhibit an overall vertical orientation relative to the coordinate plane. As a result, we defined the tree feature aggregation factor ‘R’ to encompass linearity, scattering, and elevation attributes within the point cloud. In contrast, ground point clouds in forestry environments often feature gullies, primarily characterized by planar traits and linearity. Conversely, shrubs, being closer to the ground, exhibit dispersive traits, and we aggregate scattering and planarity to describe them.
To automatically identify valuable information within the aggregated cascaded features of the DMM dataset, we applied the max pooling method using an attention mechanism. Additionally, we employed the XGBoost [
37] feature filter to refine the selection of relevant features, resulting in the final point cloud feature description. The purple color in the representation corresponds to the tree feature factor ‘R’, while the green color signifies the terrain feature factor ‘r’.
As shown in the above formula, for each point in the point cloud , calculate their linear characteristics , plant characteristics , scattering characteristics , and connecting feature vectors. The new vector is obtained , as well as connect features and given N points, and the KNN algorithm is used once for each point to find the nearest K Euclidean points The 3D coordinates of the central point, the 3D coordinates of the current point, the relative coordinates, and the European distance are connected. Then, the dimension is adjusted by MLP to adjust the vector of growth d to make the aggregation of the converged point cloud features . Through the attention mechanism feature transformation, feature extraction, where W is the learnable weight of the shared MLP, is obtained. Using the previously learned attention value to weight and sum, the attention value can be regarded as one that can automatically screen important information via soft mask. By filtering the surrounding point information, one can obtain the reduced feature vector . After attention pooling, it becomes a vector of , and the goal of the CutPursuit algorithm is to minimize an energy function, which usually includes two main parts: a data fidelity term and a regularization . Complete each small block formed by over-segmentation, S = Cutpursuit Segmentation (L); finally, the point cloud is divided into several blocks. . Each similar block has similar characteristics. “P” is part of the regularization term. This is a problem when finding the optimal energy.
Our dataset is divided into three partial point cloud feature extraction visualizations: DMM-1, DMM-2, and DMM-3, as shown in the figure above. The colors red, purple, and green represent linear, planar, and scattering features, respectively. In
Figure 7, the top-left portion depicts a single tree point cloud feature visualization, where the branches reflect more linear characteristics (red), the canopy part exhibits scattering (purple), and the ground part appears planar (green). The top-right part of
Figure 7 shows a small range visualization of several tree cloud features, where the branches predominantly exhibit linear characteristics (purple), the ground part demonstrates planarity (red), and the crown part shows scattering (green). The lower part of
Figure 7 represents the visualization of cut point cloud features in large-scale forestry scenes, where the branches exhibit more linear characteristics (red), the crown part appears scattered (purple), and the ground part appears planar (green).
We present a comprehensive elucidation of the methodology that forms the foundation of the energy partitioning network. By utilizing computational techniques to process the raw input point cloud data, we transform a voluminous dataset consisting of millions of data points into a meticulously structured format characterized by hundreds of geometric partitions. Within each of these partitions, a remarkable degree of congruity can be observed in the local geometric attributes exhibited by the data points. The genesis of this intricate geometric partitioning scheme is firmly rooted in the 3D geometric features.
For the input original point cloud (denoted as P), the process of geometric partitioning is executed by leveraging its intrinsic 3D geometric attributes. The partitioning procedure considers the distinctive features inherent to each individual point within the point cloud. Notably, in this partitioning scheme, every data point is exclusively assigned to a singular geometric partition, precluding any overlap or multiple memberships. We employ the max pooling method to automatically learn the useful information in the aggregated cascaded features through the attention method and use the xGboost feature to filter the valid features. Finally, the new aggregated features are obtained. Combining the above features, we construct an unsupervised graph of the output features in the DMM module to construct an over-segmented unsupervised data point. For each individual data point, we utilize its local geometric feature vector as a representation that encapsulates the aggregated features discussed earlier. Our primary objective is to optimize the solution for L in order to achieve a resolution for the optimization problem as described in reference.
In our pursuit of solving this problem, we draw inspiration from the concept of greedy cutting, which is strategically applied to the 3D point cloud dataset. The following section outlines the energy-optimized procedural steps for aggregating the integrated 3D point cloud features.
Problem formulation: Our endeavor revolves around the minimization of the function L. The specific problem at hand necessitates the minimization of L&O [
38], as proposed in a 2017 work introducing a working set strategy for minimizing differentiable functions constructed on weighted graphs, augmented with full variational half-table regularization.
We propose an enhanced algorithm that expands the algorithm’s applicability to functions that contain non-differentiable segments distributed across the graph vertices, as illustrated on the left. In cases where function g demonstrates differentiability with respect to variable v, our algorithm identifies the locations of smooth points in function F. These smooth points are characterized by having zero differentials while possessing both positive and negative left and right derivatives. It is important to note that this assumption holds valid when all the considered general functions exhibit convexity, thereby making a smooth point equivalent to a global minimum value.
In the hyper-segmentation map of the forestry environment scene within the DMM dataset, it is evident that the trees collectively form a diverse array, while the terrain exhibits an excessive degree of segmentation, resulting in the fragmentation of the ground into numerous minutiae. This segmentation approach effectively discerns between towering trees and diminutive shrubbery. Notably, disparate objects are discretely partitioned into distinct segments, and sizable objects undergo further decomposition into smaller constituent elements.
Figure 8, top left, depicts a single tree point cloud over segmentation visualization, revealing the division of a tree into different segments. In
Figure 7, the top right part showcases a small range of trees over segmentation visualization. In this scenario, several trees with similar characteristics aggregate into the same category.
Figure 8, lower part, presents a large-scale environment cut scene visualization, where the ground polymerization is improved, and many trees are synthesized into the same category.
2.5. PointDMM Net Network Structure
According to
Figure 9, PoinDMM is an end-to-end network architecture. The key feature of the PoinDMM network architecture is its ability to directly process raw point cloud data and transform it into a feature representation with semantic information through a series of processing steps.
First, to meet the requirements of subsequent processing, the point cloud data need to be initialized and transformed into h5 format. This step converts the point cloud into batches of 8000 points for efficient batch processing operations, considering the limitation of computer memory.
Next, the point cloud data undergo processing through the DMM module mentioned in
Section 2.4. The DMM module divides the point cloud into multiple regions based on semantic correlation and local forestry features. Features are then extracted for each region, allowing PoinDMM to capture detailed information in the point cloud data and improve its representation capability.
After feature extraction, PoinDMM employs the T network for rotation operations. This rotation operation, based on the method proposed by PointNet, improves the recognition accuracy of the point cloud by rotating it around the z-axis. By better processing the point cloud through rotation, PoinDMM enhances the accuracy of subsequent processing and analysis.
Following rotation, PoinDMM expands the dimension of point cloud features using a multi-layer perceptron (MLP). This step aims to avoid feature loss during computation. By expanding the feature dimension, PoinDMM retains more information, thereby enhancing the effectiveness of subsequent processing steps.
PoinDMM then performs convolution operations on the features. Through multiple layers of convolution operations, utilizing 32, 64, 128, and 1024-dimensional convolution layers, PoinDMM further extracts features from the point cloud data and expands them to different dimensions. This step increases the expressive power of the features and improves the capture of information in the point cloud data.
Finally, PoinDMM utilizes a sampling method to output the features. Through this sampling operation, PoinDMM reduces the dimension of the feature representation and performs semantic segmentation using PointNet, resulting in the final semantic segmentation result. The purpose of this step is to segment the point cloud data into different categories and assign corresponding semantic labels to each category.
In conclusion, PoinDMM is an effective network architecture for the analysis and processing of raw point cloud data. By directly processing the point cloud and extracting semantic information through a series of processing steps, PoinDMM produces a final semantic segmentation result. This makes PoinDMM a valuable tool for point cloud data analysis and processing.