1. Introduction
Airborne laser scanning (ALS), or light detection and ranging (LiDAR) as this technology is alternatively known, has been used, and often relied upon [
1], to generate detailed and precise digital descriptions of ground surfaces [
2], model hydrologic systems [
3], assess forest above-ground biomass [
4], map snow depth [
5], monitor powerlines [
6], and measure terrain displacement induced by seismic events [
7]. Over urban landscapes, ALS has been used to delineate road networks [
8], detect impervious surfaces [
9], identify and describe individual trees [
10,
11], biomass [
12], estimate leaf area index [
13], identify and assess trees damaged by extreme weather [
14], assess the photovoltaic potential of residential buildings [
15,
16], and in many other applications.
Many ALS data analyses and applications, including some of those mentioned above, require or benefit from information describing the range and pulse angle associated with each return or echo. Range, the 3D distance from the sensor to each return, is typically used to normalize recorded return intensity, a parameter associated with the backscattering attributes and ultimately characteristics of targets illuminated by laser pulses [
17,
18,
19]. For a given beam divergence setting of the laser instrument, the calculated range to a return also provides the precise footprint of the pulse incident upon a target. The term pulse angle, defined here as the three-dimensional, angular deviation from a pulse directed perpendicularly to a horizontal plane [
13], is calculated by fitting a 3D line connecting the first and last returns of the pulse. The first and last return are identified as a pair across all returns by their shared GPS time [
20]. As pulses are diverted by an oscillating or rotating mirror, or other hardware component, across the trajectory of the laser instrument and airborne platform, the pulse angle changes rapidly and continuously during an acquisition. Over forested landscapes, the pulse angle is known to affect the computation of tree height, leaf area index, gap fraction, and canopy cover [
21,
22,
23,
24], all important parameters for forest resource assessment and planning. This is primarily because as the angle increases, the length of the pulse trajectory intersected by a tree crown also increases, and with it, the probability for a crown return. Conversely, the probability of a ground return decreases. This phenomenon, manifested as a horizontal displacement of ground returns with respect to returns from the canopy, is sometimes referred to as ‘LiDAR shadow’ and it is more pronounced with trees featuring a high leaf area index [
13].
Unlike pulse angle, scan angle is defined as the angular deviation from a pulse directed perpendicularly to the plane of the laser instrument and its magnitude is independent of platform attitude. The scan angle pertaining to a return is a required field in all American Society for Photogrammetry and Remote Sensing (ASPRS)-compliant ALS data files (LAS) [
25,
26] and is recorded as a char (1-byte element) with, by convention, positive values for pulses oriented towards the port and negative values towards the starboard side of the aircraft. The latest LAS 1.4 format contains a provision for an extension of the scan angle field to 2 bytes to provide finer angle resolution. On a perfectly level flight, pulse and scan angles are theoretically identical. Discrepancies are introduced as the aircraft attitude changes during data collection.
In a typical ALS flight, the directional vectors of the airborne platform and, hence, those of the laser instrument, are recorded by an inertial measurement unit (IMU). A high-end Global Positioning System (GPS) device records positions which are improved during a post-processing phase using data from a nearby, strategically placed, ground GPS station operating during the flights. This information and the pulse directional vectors relative to and recorded by the laser instrument are combined to calculate the trajectory of each emitted pulse. The time that elapses between pulse emission and interception of backscatter pulse photons determines the range and the 3D coordinates of each return associated with a pulse. A detailed description of the process is available in [
27].
The data collected during ALS operations and available as a potential deliverable includes the GPS location and IMU attitude information. These data, collectively called the aircraft trajectory, are recorded at high frequencies (~200 Hz) and are essential for the production of a point cloud from the range data recorded by the laser scanner. At this recording rate, each position represents a change of only about 30 cm along the aircraft trajectory assuming a speed of 220 km/h (119 knots). Position information in the trajectory is typically referenced in GPS week time (number of seconds from the beginning of the week, resetting, or rolling over, at 12:00 AM each Sunday). This presents challenges for acquisitions lasting more than one week because identical time stamps could refer to more than one platform position. To overcome this challenge, GPS time is typically replaced with the Adjusted GPS Time (number of seconds that have passed since the beginning of GPS time: midnight between 5 January and 6 January 1980, minus one billion seconds).
While critical for ALS data generation, platform trajectory data are not always included in data deliverables, primarily due to the absence of relevant clauses in acquisition specifications. This is the case, for example, with acquisitions supporting the United State Geological Survey (USGS) 3D Elevation Program (3DEP) [
28], a substantial initiative aiming at generating high-resolution digital terrain models (DTMs) across the continental United States. A survey of data from nearly 100 ALS acquisitions stored in a portal hosted by Oregon State University (
ftp://lidar.engr.oregonstate.edu, last accessed on 31 July 2019) showed that only about one in three contained platform trajectory data. Even lower, often much lower, percentages are reported in personal communications with analysts elsewhere in the United States, Canada, and Europe.
Even when trajectory data are included in deliverables, they are not necessarily in the projection of the raw return data. Many vendors have an established data processing workflow based on a projection of choice that is applied across their entire area of operations. All intermediate data products are organized in the preferred projection and are converted to the projection desired by the client just prior to delivery. Reprojection of the trajectory data to match the rest of the deliverables is often neglected. In the absence of adequate documentation regarding the registration of intermediate products, projection discrepancies can be challenging to resolve, especially when they involve datum shifts and other somewhat esoteric parameters. In addition, retrieval of trajectory data after, often long after, delivery can be logistically complicated, time consuming, and costly. Vendors may charge fees for copies of archived data, the archive could be missing, or the vendor could have ceased operations.
The goal of this study is to develop and present a new method that computes the trajectory of the laser instrument using the point cloud. We evaluate the resulting trajectories using ALS data from five acquisitions across different biomes in the United States Pacific Northwest region and compare the computed trajectory to the trajectory recorded during the data acquisition.
2. Materials and Methods
2.1. Study Area and Airborne Laser-Scanning (ALS) Data
We used ALS data from five acquisitions in the US Pacific Northwest region states of Oregon and Washington (
Figure 1), titled Colville, Deschutes, McKenzie, Tillamook, and West Metro. Each represents a distinct biome. All selected data sets contained airborne platform trajectory information provided by the vendor. GPS time was recorded in the weekly format except for McKenzie, where the adjusted format was used. McKenzie was acquired with a dual laser instrument configuration and had a platform trajectory file in a projection different from the one used for the raw laser data. The Tillamook area is representative of the temperate rainforests and topography of the coastal Oregon mountain range. It is subject to intense forest management. West Metro contains portions of the cities of Portland, Beaverton, and Hillsboro, with moderate tree cover over hills and low housing density in the northeastern section, gradually leveling off to plains with sporadic trees and high housing density towards the west. The McKenzie area contains the eastern portions of the City of Springfield to the west, followed by agricultural land, and then high tree canopy cover forests to the east towards the Cascade mountain passes. The Deschutes area, part of a much larger acquisition, is located at the eastern foothills of the Cascade mountains on rolling terrain and is characterized by open stands with medium to low tree canopy cover. The Colville area is in the northeastern part of the State of Washington. It is within the Northern Rockies ecoregion and is dominated by dry highland forests [
29]. Descriptive statistics of each study area are shown in
Table 1. Elevation and slope metrics were calculated using 1 m digital terrain models (DTMs) generated from the ALS data. Tree canopy cover calculations used a 2 m above-ground return height cutoff value as an indicator of tree returns and, for McKenzie and West Metro, a prior elimination of returns classified as ‘building’ by a GlobalMapper 20 LiDAR module [
30]. Acquisition specifications are shown in
Table 2. Across all areas, pulse density, and therefore return density, is high, exceeding 7 pulses per square meter. The field-of-view, or bilateral scan angle, is lower and sidelap is higher than what is common in ALS acquisitions elsewhere. The specifications for acquisitions selected for this study are similar to those used by the Forest Service, the Oregon Department of Geology and Mineral Industries (DOGAMI), and other partners collaborating on ALS acquisition in the region. They are considered optimal for minimization of artifacts such as the laser shadows mentioned earlier.
2.2. Pulse Returns to Ray Convergence
A laser pulse incident upon an opaque object yields a single return. When incident upon a non-solid object, such as a tree crown, each pulse will typically generate more than one return. Multiple returns can be generated from solid objects as well, if the pulse direction is nearly parallel to the object’s surface. An example for the latter case would be a pulse skimming the side of a building with one return located near the top, and one or more returns at different heights on the building’s side. As stated earlier, return coordinates are calculated using the parent pulse’s directional vectors and the position of the aircraft when the pulse is emitted. This implies that all returns generated by a single pulse should be located exactly on a 3D ray starting at the instrument and directed towards the target. Also, that the equation of the pulse ray can be reconstructed from the coordinates of returns, provided of course that there are two or more. It is a trivial exercise to confirm, however, that intermediate returns of a pulse yielding more than two are not located exactly on the ray defined by the first and the last because coordinates are stored with finite precision, usually rounded to a centimeter or equivalent unit fraction in non-metric systems. Coordinate precision issues propagate to the coefficients of the calculated ray equation and induce noise to the location of the ray. Noise levels increase in magnitude proportionally as the distance from the midpoint between the first and last pulse returns increases. At a kilometer away, the typical distance between returns and airborne platform location, the effects of return coordinate precision on pulse ray positioning can be substantial. We quantify those effects via simulation below.
Let two pulses P1 and P2 intersecting at scanner location S that is positioned at height HT above the ground. The coordinates of the last (P1
L, P2
L) and first returns (P1
F, P2
F) define the 3D equation of each pulse ray and the distance D between them (
Figure 2). P2 is set to be perpendicular to a horizontal plane so that the angle
θ it forms with P1 at location S is also the pulse angle of P1. Next, we add noise to the coordinates of P1
L, P1
F, P2
L, and P2
F of magnitude that follows the distribution of precision reduction introduced when real numbers are rounded to two decimal digits, the typical level of precision in ALS deliverables. It can be shown that the resulting per-dimension coordinate error term induced by the rounding has range [−0.005, 0.005] and is distributed ~U(0,2.88675*10
−3).
With all four return coordinates displayed in
Figure 2 jittered from their original position, the two pulse rays no longer intersect at S. We identify on each jittered pulse ray the point that is closest to the jittered instance of the other pulse ray. A third point located exactly on the middle of the line segment that connects these two closest, one on each ray, points is often called the closest point of approach (CPA). It can be computed using calculus [
31]. We adopted an alternative, more geometric approach described in [
32]. The 3D distance between the CPAs and S (
Figure 2) quantifies the effect of return coordinate precision on ray convergence.
We considered 9600 combinations of HT, distance D between first and last return, and pulse angle
θ. The values considered for each parameter are displayed along the axes of each panel in
Figure 3. For each combination, we run one million coordinate jittering instances, calculating for each the distance of the corresponding CPA to S and the distribution moments of that distance. All computations were performed in R [
33] with embedded C code for gains in computational efficiency. The error mean across parameter combinations ranged from 0.092 m to 134.128 m, with median 0.786 m, mean 2.193 m, and standard deviation 5.522 m. We finally regressed the three parameters,
HT,
D, and
θ, after applying a natural log transform to their values, against the natural log of the error means (see
Supplementary Materials). The resulting regression formulation was:
with R
2 = 0.9942. Equation (1) confirms that the error increases with platform height above the targets and decreases as the distance between pulse returns and the pulse angle increase. It implies that within a set of multi-return pulses of various angles and first to last return distances, we can identify those conducive to a more precise estimate of pulse convergence location and, therefore, of the position of the laser instrument onboard the airborne platform. We provide the code that performs the simulation so that variants of Equation (1) can be calculated for parameter values representative of a specific ALS data set.
With pulse repetition rates ranging from tens to hundreds of thousands per second and a nominal above-target flight height between hundreds and a few thousand meters, the position of the laser instrument onboard the airborne platform can be considered practically stationary during the time it takes to emit pulses within a single scan line, or, equivalently, a full swing of the oscillating or rotating mirror of the laser instrument. By calculating the ray equations of multi-return pulses from the coordinates of their first and last returns, we can estimate the location of the instrument as the centroid of all CPAs between them. The exhaustive computation of CPAs across all multi-return pulses emitted within a single scan line is logistically infeasible though. Consider, for example, an instrument with 150 kHz pulse repetition frequency, 60 Hz mirror (scan) rate, and ±30 degrees scan angle, operating over a forested landscape. Assume also that, owing to customer-preferences, pulses with absolute scan angle higher than 20 degrees are filtered out prior to delivery. Within 16.7 ms, the time it takes for a full mirror swing, 2500 pulses will be transmitted. After removing one-third of them as exceeding the scan angle specifications and another one-half expected to yield single returns, we are left with 833 multi-return pulses. The pairwise combinations of those 833 pulses are 346,528 per 16.7 ms or 20,791,680 s−1. For a scanning swath generated during 10 min of instrument operation, the pulse combinations are more than 12 billion (1.2475 × 1010), along with an equal number of CPAs, the calculation of which comes with substantial computational cost. Further, an exhaustive computation will include many near parallel pulse ray pairs that would generate CPAs positioned at larger distances from the actual platform position. According to Equation (1), for HT = 1000 m, D = 10 m, both common settings, and θ = 0.1 degrees (near-parallel pulse rays), the expected CPA error is 72.8 m, a large distance. We have confirmed experimentally that CPA outliers emerging during a single scan swing are not distributed uniformly around the true platform position and thus bias its estimation calculated as the centroid, or other similar metric, across all CPAs for the scan swing in question. We thus adopted an alternative, selective rather than exhaustive approach that identifies specific pulse pairs to use for CPA calculation. The selective approach is computationally efficient.
Raw ALS data are typically delivered organized in tiles of rectangular planar extent. Each tile contains returns from multiple adjacent scan swaths. In our application, the tile files must first be reorganized into individual swaths. We provide a program that supports the reorganization process. Our program assumes the swath identifier is stored in the “point source ID” field in LAS files (see
Supplementary Materials) and only works with LAS formats up to version 1.2 [
26]. Similar capability is also available in LAStools using the lassplit program. LAStools supports all versions of the LAS format and data compressed using the LAZ format [
34]. The scan swaths are then processed sequentially. Each is queried to identify and remove single return pulses and the intermediate returns of multi-return pulses. For the remaining pulses, the 3D distance between the first and last return and the pulse angle are calculated and entered in Equation (1) to generate pulse-specific estimates of the expected error in pulse ray convergence to the actual airborne platform location. The computed estimates are treated as pulse-specific weights. The sign of scan angle pertaining to each pulse, recorded in the raw laser file, is applied to the weight, resulting in positive values towards one side of the swath and negative towards the other. Although the number of weights that must be computed can still be large, approximately 3 × 10
7 in the 10-min swath given as an example above, it is approximately three orders of magnitude smaller than what it would be required for an exhaustive calculation of CPA across pulse pairs. In addition, the calculations embedded into Equation (1) are a linear scaling of tabular values, namely pulses angle and the distance between first and last return, calculated previously and are thus performed efficiently.
Next, we specify a time interval Δt that must be at least equal or larger than the time it takes to complete a full swing of the laser instrument’s mirror. Δt is used to assign swath pulses into contiguous, in terms of GPS time, groups, henceforth referred to as time blocks. The reciprocal of the scan rate, usually mentioned in data delivery reports, can be used to determine the minimum value of Δt. Then the weights of the pulses in each time block are queried. The two with the extreme values are identified and the CPA of their rays is calculated. Henceforth, they are referred to as the optimal pulses in the time block, in the sense that they their CPA is expected to be the closest among all other pulse pairs of the time block to the actual trajectory of the laser scanner. The top panel of
Figure 4 shows an example with the spatial distribution of the first returns of pulses selected for CPA calculation for each time block. The bottom panel details the rays of the two selected pulses within the return cloud of a single time block. The collection of calculated CPAs, one for each time block and each carrying a GPS time reference, represents the trajectory of the airborne platform location for the scan swath.
2.3. Closest Points of Approach (CPAs) to Platform Trajectory
The form of a CPA set calculated for a scan swath using the methodology described above can vary, depending on the characteristics and spatial distribution of the objects illuminated by the laser. In general, CPA sets appear as linear features, and thus conducive to a detailed and precise extraction of the actual airborne platform trajectory via interpolation. A closer inspection, however, often reveals discontinuities, primarily along the
Z axis, and noise, sometimes of substantial magnitude. To prevent serious issues with trajectory interpolation, CPA outliers must be removed. We do this using an iterative process that relies on fitting cubic splines [
35,
36] to each CPA set. We define a vector of distance thresholds sorted in descending order, and CPA points with residuals from the spline fit exceeding the threshold that is effective for the iteration in question are removed. The 3D distance thresholds used in the five study areas were 500, 300, 200, 150, 100, 75, 50, and 25 m respectively. The cubic spline fit in the last iteration becomes the estimated platform trajectory. The formulation of the spline uses the GPS time as an independent variable. For a given time, it provides an estimate of the [X, Y, Z] vector of the laser instrument.
Cubic splines are a specific type of polynomial interpolation techniques. Unlike them, standard methods apply an nth degree polynomial to a set of n + 1 unique values with y
i = f(x
i), (i = 0, …, n), where x
i represents the observed values and y
i the interpolation estimates. A major deficiency of these methods is that as the number of observations increases, so does the degree of the polynomial, leading to overfitting and oscillatory polynomial behavior particularly towards the ends of the observed value series. Splines attempt to overcome the overfitting issue by utilizing several, instead of just one, low degree polynomials P
i(x), each fit in the interval between x
i-1 and x
i. The degree of the polynomials defines the spline type. Cubic splines S(x) utilize third order polynomials with formulation that ensures continuity and smoothness. Continuity implies that any two consecutive polynomials yield the same value estimate, or ‘connect’, at joint value x
i, and P
i(x
i) = P
i+1(x
i). Smoothness requires that they have the same derivative at their joint value, or
, where λ is a smoothing parameter. Larger values of λ induce more smoothing. The objective function of a cubic spline is:
where P″ is the second derivative of the polynomial effective at value x. The first component of the objective function represents the typical standard error between observed and predicted values. The second component is known as the smoothing operator and determines the curvature of the spline. If the interpolation is exact, the spline is forced to pass through each of the observed values y
i, the standard error component of Equation (2) is zero, and the curvature of the spline could be rather pronounced, mimicking the oscillatory polynomial behavior mentioned earlier, thereby inflating the overall value of the objective function. Conversely, if λ = 0 no smoothing is applied, often leading to a standard error component of large value, especially if the x
i values contain substantial amounts of noise. The optimization of the objective function is achieved iteratively, within predefined bounds of λ. Alternatively, the optimization can be applied to a subset of the observed x
i values. Given the arrangement of a CPA set in space, the optimization is carried is each dimension separately. We fit cubic splines to all CPA sets using the smooth.spline function in the R stats base package considering the CPA parent pulse weights. The value of parameter λ is implicitly constrained within [−1.5, 1.5].
We investigated systematically the effects time block duration (Δt) and CPA intensity, defined as the time interval between points considered in the cubic spline optimization process detailed above, have on airborne platform trajectory estimates obtained for flight lines in each of the five study areas. We used Δt values from 0.01 s to 0.05 s at 0.01 s intervals and from 0.05 s to 1 s at 0.05 s intervals (total 24) and CPA intensity ranging from 1 point per second to 1 point every 10 s of flight time, at 1 s intervals. A total of 240 cubic splines were calculated for each scan swath. To assess goodness of fit, each spline was compared to the recorded platform trajectory, using the GPS time values present in the latter as reference. Finally, we evaluated how the discrepancies between estimated and observed trajectory lines propagate to the calculation of pulse angles and the relationship between pulse and scan angles recorded in the raw laser data.
4. Discussion
The approach introduced in this study can retrieve the trajectory of an airborne laser instrument by exploiting the spatial arrangement of multiple return pulses it generates. Over forested areas the retrieval is accurate and precise, provided that recommended values for parameters which control the process are applied. In the presence of continuous canopies, time block duration under 0.5 s, and time interval between CPAs not exceeding 5 s, submeter precision should be expected (
Figure 7). In such conditions, the estimated trajectory obtained by using our approach should be considered practically equivalent to the actual trajectory with a degree of precision adequate for most uses. The error rates observed over forested areas ensure that efforts to calibrate the raw recorded return intensity using the range (distance) and angle between the instrument and individual returns will require adjustments that are no more than 1 unit, for intensity quantified in the 0–255 range, different from those obtained by using the actual trajectory. Differences between estimated and recorded pulse angles will not exceed a few mrad. Even with suboptimal parametric choices, the trajectory estimation error should be expected to not exceed a few meters. The observed performance was achieved by applying our approach to dense return clouds (>7.5 pulses m
−2). For optimal performance in acquisitions with much lower pulse densities, time block duration and CPA interval might need to be adjusted.
Our methodology partitions individual scan swaths into continuous, non-overlapping sections (time blocks), and for each section it identifies the pair of pulses believed to be optimal for assessing the position of the laser scanner. These provisions reduce the overall computation load and expedite the process. Our approach performs best where multi-return pulses are present in all time blocks and swath parts. Sporadic departures from this ideal arrangement are usually tolerated quite well (
Figure A3). Where, however, there is a paucity of multiple return pulses across several consecutive time blocks or where the multi-return pulses are concentrated in a specific part of the swath, performance is suboptimal and the estimation error higher. Agricultural land and urban areas with limited tree cover belong to this category. Both discrete and waveform laser data sets restrict identified returns to be no closer than a threshold distance along the pulse trajectory. In the five study areas this threshold was between 1.5 and 2 m. The implication is that vegetation shorter than the threshold, the usual case over agricultural crops, will be represented only by single return pulses that are removed in an early stage of our process because they do not permit the calculation of the pulse’s ray. In urban areas, the majority of targets are opaque to the pulse photons and yield only one return. Buildings often completely occlude trees and tall plants or limit their exposure to the scanning instrument from limited viewing directions. Similarly, pulses with multiple returns from building sides and other manmade structures are concentrated to a narrow strip underneath the airborne platform where their angles to the vertical are narrow. In such conditions, and as implied by Equation (1), the CPAs generated can have large location errors. This is manifested in the West Metro area where the estimation errors are much larger than all the other study areas. Results from the same area suggest that in the presence of lower density and lack of uniformity in the spatial distribution of multiple returns, the value of the CPA interval parameter should be higher than those over forested areas.
We explored many alternatives and scenarios aiming at improving the scanner trajectory estimation for scan swaths containing large areas with few multiple returns or with multiple return pulses concentrated to a particular section but were unable to identify a consistent solution for those challenging circumstances. The alternatives explored included, but were not limited to, conditioning the directional vectors of the estimated trajectory to parallel the ground surface, locally replacing the cubic with a linear spline, or using dynamic, instead of fixed, time block duration adjusted to the spatial distribution of multi-return pulses. Each of these alternatives was found to improve the estimation precision for a selected subset of swaths but reduced it in most others. The approach detailed here had the most consistent performance.
At the design phase of an ALS acquisition, flight lines are drafted to minimize flight time and, therefore, cost. Longer flight lines are usually preferred (
Figure 1), often positioned in approximate alignment to physiographic features such as watersheds. The exact spatial extent of data deliverables in the US pacific northwest, however, is often determined by land ownership (
Figure 5). The resulting trimming of the return clouds usually affects several swaths proximal to the acquisition boundary and thus our ability to retrieve the scanner trajectory precisely. Trimming along the side of the swath leads to much higher error levels than trimming at the ends. Post-trimming pulses tend to be concentrated to smaller sections of the original swath and their converging rays form acute angles. The simulation results mentioned in
Section 2.2 suggest that in such conditions calculated CPAs carry much higher positional uncertainty compared to those of complete swaths. Our analyses with actual data have corroborated this assumption. The acute angle formed between a pair of pulses identified as optimal within a time block, coupled with return coordinates rounding to two digits, occasionally yield CPAs that are kilometers higher than the true position of the laser instrument. At other times, the CPA of a pulse pair is below ground! The iterative fitting of cubic splines is very effective in eliminating outlier CPAs, but it also removes a few legitimate ones, especially where the airborne platform pitch changes rapidly, an event that occurs quite frequently over mountains with alternating ridges and valleys. To quantify the swath trimming effects, we presented our results separately for pulses with and without substantial boundary trimming and/or end trimming. The metric used (3D RMSE) to evaluate performance is the standard applied to performance assessments of models that have spatial components. It is, however, sensitive to outliers. As documented in
Figure A3, trajectory estimation errors corresponding to a small part of the swath processed, even for a few seconds on either of its ends, inflate the overall error. To mitigate this sensitivity, we used percentiles as metrics of 3D RMSE distribution in our presentation of analysis results.
An analyst applying our methodology can only evaluate the residuals of the spline fit to CPAs (
Figure 6 and
Figure A1) as indicator of method performance. A typical residual median should be expected to be in the 2 to 4 m range over forested areas, and perhaps twice as large elsewhere (
Figure 8). Our results indicate that the actual trajectory estimation error (3D RMSE) would likely be much smaller, about one fourth of the spline fit residual median over forests, and half elsewhere. Although not common, and despite efforts to avoid it, the residuals of the spline fit to CPAs can be zero. This implies an exact fit, with the spline intersecting every CPA, and that the value of the smoothing parameter λ in Equation (2) is near the bounds specified, [−1.5, 1.5] in our analyses. In such occurrences, the sinuosity of the spline is pronounced and the trajectory estimation error large. It is suggested that for the rare cases of swaths with exact spline fit and zero residuals, the time block duration (Δt) and/or the CPA intensity settings are altered slightly and the fitting process is repeated.
Our process was designed to be fully automated so that it can be applied effortlessly to acquisitions of any size. Manual adjustments of Δt and CPA interval coupled with visual assessment of the cubic spline fit will likely improve the trajectory estimation precision for most swaths with the problematic multiple-return pulse distribution characteristics mentioned above. Manual adjustments can also improve the estimation precision for swaths trimmed with the acquisition boundary. For example, the trimmed swath depicted in red in
Figure 4 can be split into three sections. While the split will not improve the spline fit of the two shorter sections, it will reduce by more than half the 3D RMSE for the longer, as compared to the error for the entire swath, before the split. Manual interventions by an analyst may be realistic for a data set comprising a small number of swaths. For larger acquisitions, however, as the five used in this study, data processing involving multiple iterations and substantial analyst involvement can be economically and logistically infeasible.
The implementation of our approach requires two passes over the raw return cloud files, one to reorganize the tiled data into swaths, and a second for the derivation of the scanner’s trajectory. Reorganizing the tiled data into swaths is a simple, fully automated task if each return is furnished with a unique flight identifier. Alternatively, the GPS time data can be used to identify swaths, but the process can be complicated if time is recorded in the standard format and the acquisition has lasted more than a week. In such cases, different swaths can have overlapping time intervals and spatial extents. In rare cases, the reorganization process may require several passes over the data and analyst’s supervision. While tile size is controlled by the data vendor to a manageable size, files generated from the reorganization into individual swaths can be very large. In this study, there were many swaths more than 40 km long, with compressed file size [
34] larger than 3GB. Processing of very long swaths may necessitate the use of computers with compatible hardware configuration (enough memory). Otherwise the long swath must be first split into chunks with a few seconds of time overlap, followed by processing of each chunk separately to identify time blocks and to calculate CPAs, and finally fuse the chunk CPAs, prior to fitting the cubic spline.
In the event that the nominal above-target height of the scanner cannot be established from the data delivery report or otherwise, a tentative value can be used to compute the vertical distance between the derived platform trajectory and the corresponding terrain elevation for a subset of the scan swaths. The resulting approximate mean above-target height can then be used with the entire data set. As documented in
Table 4, discrepancies between actual and approximate nominal heights have limited impact on the overall precision of the estimated scanner trajectory. Achieving the highest attainable precision would require the process to be repeated twice. Once following the standard approach to calculate an initial trajectory, and a second time using time block-specific scanner heights calculated as the vertical difference between individual returns and the initial trajectory. The extra effort may be warranted in data acquisitions over mountainous terrain where departures from the nominal above-target height can be frequent and substantial.
The coefficients of pulse angle and distance between first and last return are used as weights to determine the pair of pulses present in a time block that is optimal for assessing the location of the laser scanner. They can be computed via simulation for a range of pulse angles and nominal scanner height above the target. The code provided for this purpose assumes that the only source of imprecision in the coordinates of returns is rounding to two decimal points. However, return coordinates stored with limited decimal precision in an intermediate system favored by the data vendor and reprojected, using the same level of precision, to the coordinate system requested by the client just prior to delivery carry positional uncertainty that is not anticipated by the code provided. The simulation settings must be adjusted to account for reprojection conditions, before being used to calculate pulse weights.
It is often stated that the scan angle recorded per individual return in raw laser files (see
Supplementary Materials) is an adequate surrogate of the pulse angle for analyses that require the latter. Our calculations in each of the five study areas did not support this claim. As shown in the example of
Figure A4, a considerable portion of returns exhibited substantial discrepancies between scan and pulse angle. After examining thousands of kilometers or recorded scanner trajectories in the US Pacific Northwest, we determined that these discrepancies emerge primarily from instability in the airborne platform’s pitch. Roll was much more stable. All aircraft used for laser data acquisition in our study areas maintained an upward pitch of about −5 degrees (electronic supplement). This is necessary to ensure adequate air lift. As a result, pulses are nominally directed forward instead of nadir. Larger discrepancies between pulse and scan angles are observed where the airborne platform dives or climbs abruptly, in response to local atmospheric instability or in an attempt to maintain nearly constant distance to the targets, especially over mountainous terrain. It should be noted that for gyrostabilized laser instruments, scan and pulse angular difference are expected to be minimal.
The technique described can be applied to data acquired with unmanned aerial systems (UAS), provided that they fly in approximately linear trajectories, can record at least two returns per pulse, and have return GPS time recorded with precision conducive to identifying individual pulses. Calculated UAS trajectories may be more precise than those computed from airborne systems because their above-target height is much lower and return coordinates are typically recorded with 3 digits of precision. Unlike UAS-based laser data, return clouds from Geiger mode or photon-counting systems are incompatible with our approach owing to fundamentally different data organization in a cubical tessellation with each ‘return’ representing backscattered photons from more than one pulse.