1. Introduction
A recent study on urban growth typology shows that there has been a large increase in high-rise buildings in China [
1]. The building height information holds significant application value in various fields, such as urban local climate [
2,
3], building energy consumption evaluation [
4,
5], urban pollution dispersion [
6,
7], urban carbon emissions evaluation [
8,
9], earthquake perception [
10], and urban 3D reconstruction [
11]. Therefore, building height extraction over large regions is essential for a comprehensive understanding of an urban development.
Remote sensing technology is the most commonly used method for building height extraction. Typically, building heights are extracted through three approaches: airborne light detection and ranging (LiDAR), side-looking radar imagery, and high-resolution optical imagery. Airborne LiDAR allows high accuracy measurements [
12]. These algorithms extract buildings and their heights through point cloud classification algorithms [
13,
14] or utilize building footprints from digital maps to reconstruct buildings in three dimensions [
15]. However, airborne LiDAR has limitations in coverage and high costs. Algorithms utilizing side-looking radar imagery often require building footprints obtained from digital maps or other sources [
16,
17,
18,
19]. Nevertheless, with the side-looking geometry, radar images usually record signals from a mixture of different microwave scattering mechanisms, leading to relatively high uncertainties of building height extraction [
20].
In contrast, optical satellite imagery has high acquisition efficiency and offers abundant spatial details, hence being widely applied in building height extraction. For single optical satellite images, the shadow-based method is commonly employed to extract building heights. This method utilizes the relationship between the sun, satellite, building rooftops, and shadows in the imagery to extract building height [
21,
22,
23,
24,
25]. However, the shadow-based method faces difficulties in building height extraction when buildings are short or when shadows are occluded by other objects [
26].
For stereo images, a common method involves generating a DSM through dense matching and projecting building footprints or rooftops onto the DSM to extract building heights. Liu et al. [
27] utilized semi-global matching (SGM) [
28] to generate a DSM, employed morphological filtering [
29] on the DSM to generate DEM, and finally derived the normalized DSM (nDSM) using the maximum values within the nDSM as the building heights. Wang et al. [
30] improved DEM generation with the more precise cloth simulation filter (CSF) method [
31]. To address the issue of missing rooftop elevations in a DSM generated by the SGM algorithm, Zhang et al. [
26] proposed a contour-constrained rooftop matching algorithm for building height extraction.
With the rapid development of deep learning, deep learning methods have been widely applied in dense matching [
32,
33,
34], opening up new possibilities for building height extraction. For instance, Chen et al. [
35] utilized a DSM generated by deep learning algorithms in building height extraction. End-to-end deep learning methods have also been proposed for building height extraction in stereo images. Cao et al. [
36] designed the
network to extract buildings and their heights from multi-view, multi-spectral images. This method does not rely on dense matching algorithms but requires known building height data for training.
The GF-7 satellite is capable of capturing panchromatic stereo images spanning 20 km in width with a resolution finer than 0.8 m. Its backward camera holds a tilt angle of −5 degrees, while the forward camera tilts at 26 degrees, maintaining a favorable balance between minimized occlusion and a wider stereo intersection angle. It offers valuable data for building height extraction. However, limitations in resolution and the forward camera tilt angle challenge the application of current dense matching algorithms, hindering their accuracy in building height extraction. Relevant research indicates that many 3D breaklines are modeled as more or less smooth transitions from ground level to building level [
37].
Figure 1a,b illustrates the impact of this problem on building height extraction. This DSM is generated by the algorithm of He et al. [
32] using GF-7 stereo images of Xi’an. In
Figure 1a, inaccuracies in the ground elevation around the building are evident. While the actual ground elevation is 355 m, the DSM shows elevations higher than the reality.
Figure 1b shows inaccuracies in high-rise buildings. The actual building height is 350 m, with a rooftop elevation of 702 m. There are substantial differences in shape and elevation between the reconstructed buildings and their actual counterparts.
Figure 1c illustrates occlusion caused by trees in Guangzhou. Detailed data for both Xi’an and Guangzhou are provided in
Section 3.1. These challenges lead to difficulties for algorithms relying on a DSM in accurately extracting the building heights.
To improve the building height estimation accuracy, we proposed a contour matching enhanced building height extraction method. Instead of overlaying the building contours on the DSM directly, we used a contour matching algorithm to obtain more accurate rooftop elevation and ground filtering to generate a DEM from the DSM for more robust ground elevation. Firstly, the given building contours, which can be in ground space or on a GF-7 backward image, are matched to GF-7 forward images with a contour matching, and the rooftop elevation can be extracted using the geometric relationship between the matched building rooftop. Secondly, the ground elevation around the building can be extracted from the DEM, which filters the DSM generated from GF-7 stereo images. GF-7 multispectral images are utilized to improve the accuracy of ground filtering. Finally, the difference between the rooftop elevation and the ground elevation represents the building height.
The main contributions of this paper are as follows:
An object-level contour matching algorithm is proposed to extract the rooftop plane elevation. Contrary to the pixel-level dense matching, which can generate smooth transitions in the DSM, the proposed algorithm, taking the rooftop as an object, can overcome the complex detail interruption of the rooftop.
A ground filtering considering ground types is proposed for ground elevation extraction. Most existing ground filtering algorithms, which are designed for LiDAR cloud points with multi-echo, will not generate good DEM when applied directly to a satellite-DSM-generated DSM. In our new algorithm, we use multi-spectral imagery to assist in identifying non-ground points and inaccurate ground points in ground filtering algorithms.
Our paper is organized as follows:
Section 2 of the paper extensively describes the extraction of the building height and discusses scenarios where multiple elevations exist for building rooftops.
Section 3 of the paper demonstrates the effectiveness of this approach through experiments. The proposed algorithm is discussed in
Section 4. Finally,
Section 5 concludes this paper.
2. Methodology
The algorithm workflow for building height extraction is illustrated in
Figure 2. The known data required in this algorithm include the GF-7 images, DSM generated from GF-7 stereo images, building footprints in the geographic coordinate system, or building rooftop contours in GF-7 backward images. The contour matching algorithm for building footprints (CM-F) is described in Algorithm 1. The building rooftop contours in GF-7 backward images may have unclear edges or may encompass podium buildings and building sides. Our algorithm utilizes the backward images to reduce the impact of unclear edges. Furthermore, it is possible to use differences between the forward and backward images to identify building sides and podium buildings. The contour matching algorithm for building rooftop contours (CM-R) is described in Algorithm 2.
Algorithm 1. The contour matching algorithm for building footprint (CM-F) |
Input: GF-7 forward image , building footprint , DSM. Output: Building height H. Estimate the elevation search range of rooftop . ( Section 2.5) for all in Obtain candidate building rooftop . . ( Section 2.2) Calculate the weighted contour matching degree . ( Section 2.3) . ( Section 2.4) Extract the ground elevation around the building . ( Section 2.5) Calculate the building height H.
|
Algorithm 2. The contour matching algorithm for building rooftop contour (CM-R) |
Input: Stereo pair images and , building rooftop contour , DSM. Output: Building height H. Generate epipolar images and from and . Extract contours from and . ( Section 2.1) Estimate the disparity search range of rooftop in the epipolar image . ( Section 2.5) Generate building contour template based on . ( Section 2.2) Calculate the contour matching degree on , denoted as . ( Section 2.3) And obtain the set of matched building edges . ( Section 2.6) for all in Calculate the weighted contour matching degree, denoted as . ( Section 2.3) Obtain the building rooftop elevation . ( Section 2.4) Calculate the contour matching degree in , denoted as . And obtain the set of matched building edges . ( Section 2.6) Input , , , into Algorithm 3 to identify the building side and podium building. Extract the ground elevation around the building . ( Section 2.5) Calculate the building height H.
|
2.1. Image Contour Extraction
Building contour consists of a collection of edges formed by continuous curves or lines, which match with the edges extracted from the image in contour match. The Canny edge detection algorithm [
38] is utilized to extract edges in the image as contour points. The gradient direction of the image is calculated as the contour point direction, as shown in Equation (1):
where
and
represent the gradients in the horizontal and vertical directions, respectively. In the arctan function, the signs of
and
are used to ensure that the gradient direction ranges from [−π, π].
This study extends the range of contour point direction values from the [0, π] as in conventional methods [
39] to [−π, π]. Due to the parapet walls at the rooftop, there are two adjacent indistinguishable edges in the image. By expanding the range of gradient direction, these two edges can be distinguished based on their positive or negative gradient directions. An example is provided in
Figure 3.
2.2. Building Contour Template Construction
Building contour templates are constructed to describe building rooftops.
Figure 4 illustrates the process of building contour template construction. The vector polygon of the building is simplified by the Douglas algorithm [
40]. Then, we created buffer zones for the edges of the vector polygon. The pixels within the buffer zone are considered potential contour points that constitute building contour templates. Their weights are calculated by the distance to the building edges, as shown in Equation (2).
Here, represents the buffer distance; denotes the distance from the point to the edge in pixels, where is negative when the point is inside the building contour.
The potential contour point direction is perpendicular to the corresponding edges of the polygon. As buildings in remote sensing images are generally brighter than other features [
41], we set the potential contour point direction points inside the polygon. For any point
on the edge, draw a perpendicular line to the edge. The potential contour points that the perpendicular line passes through are grouped as a set, denoted by
. In contour matching, the matched contour of
is found within the range of
.
2.3. Contour Matching Degree Calculation and Building Contour Template Correction
The contour matching degree represents the similarity between the building rooftop and the contours within the image. The contour matching degree is calculated as follows: The building contour template is moved to the location of the candidate building rooftop in the image, and each potential contour point can correspond to an image pixel. When a corresponding image pixel is a contour point extracted from the image, the angle between the potential contour point direction and contour point direction is calculated, denoted as
. Then, the weight of the contour point in the image is calculated using Equation (3).
In this equation, represents a penalty coefficient. In our study, is assigned a value of 0.5 experientially.
In set
, the contour point with the maximum weight is matched with the
, denoted as
. We denote this maximum weight as
, and the contour matching degree can be calculated using Equation (4). When the candidate building rooftop is changed, the building rooftop contour in the image will move along the epipolar line. Therefore, the building edges perpendicular to the epipolar line play an important role in roof elevation extraction. Consequently, by increasing the weights of contour points in these edges, more accurate rooftop elevations can be obtained, and the weighted contour matching degree is computed using Equation (5).
In this context, represents the total number of sets , and denotes the circumference of the building contour in pixels. The value of the weight function is determined by the edge where is located. When the angle between the edge and the epipolar line exceeds 60 degrees, = 2; otherwise, = 1.
In practical application, the input building rooftop contours extracted by the building extraction algorithm may have unclear edges. Building contour template correction can improve the accuracy of the algorithm in this case. By computing the contour matching degree between the building rooftop contour and the GF-7 backward image, the matched contour points in the backward image are found and used to recalculate the weights of the potential contour point. The corrected weights of the potential contour point are calculated as follows: for any set
, if
, then the distance
between potential contour points within
and
is calculated. Subsequently,
is used in Equation (2) to recalculate
. If
, the
values of potential contour points in
are set to 0. The correction results are illustrated in
Figure 4.
2.4. Building Rooftop Elevation Extraction
The principle of building rooftop elevation extraction is illustrated in
Figure 5. According to known building contour, multiple candidate building rooftops can be obtained within the elevation search range of rooftop. These candidate rooftops are projected onto the GF-7 forward image using the rational function model and verified by contour matching.
The conventional contour matching method [
39] sets a threshold for the contour matching degree and obtains the matched building contour based on the maximum value of the contour matching degree. In rooftop elevation extraction, multiple local maximum values of contour matching degree are caused by similar buildings or unclear building edges. The local maximum values lead to mismatches and significant errors. Therefore, our study utilizes elevation information from the DSM to filter out the local maximum values with significant errors.
The curve of contour matching degree versus candidate rooftop elevation is acquired at first. The elevation search range of the rooftop can be estimated using Equation (6).
where
is set to be slightly greater than the estimated maximum building height, and
is the minimum elevation within the building buffer zone.
For building footprints in geographic coordinates, the variation between adjacent candidate rooftop elevations is set based on image resolution and stereo intersection angle. For each within the range , the candidate building rooftop is projected onto the GF-7 forward image, and can be calculated using the method mentioned earlier. For the building rooftop contours in the backward image, the elevation search range of rooftop is transformed to the disparity search range of rooftop, designated as . For each integer within the range , the and rooftop elevation is calculated, allowing us to acquire the curve of versus rooftop elevation.
The minimum elevation and maximum elevation within the building buffer zone in the DSM are utilized to filter the local maximum value of contour matching degree. The local maximum values of contour matching degree are sorted in descending order, denoted as , and their corresponding rooftop elevations are denoted as . If condition is satisfied, it means the contour matching degree has a significant maximum value, and is the rooftop elevation. In the absence of a significant maximum value, two situations need to be distinguished. If any local maximum value satisfies , and the rooftop elevation satisfies , then is considered as the rooftop elevation. If condition is satisfied, it is considered that the corresponding building rooftop does not exist in the GF-7 forward image. This indicates that the building is occluded in the forward image or that the known building differs from reality.
2.5. Ground Elevation Extraction around the Building
Our proposed method utilizes the results of GF-7 multispectral image classification to enhance the accuracy of the DEM generated by the ground filtering algorithm. GF-7 multispectral images are employed to compute the normalized difference vegetation index (NDVI) and the normalized difference water index (NDWI), allowing for the classification of vegetation and water from the image. By projecting input buildings into the DSM, the building can be classified from the DSM. The non-ground points such as vegetation and buildings are removed from the DSM. Additionally, large water bodies lacking texture that tend to cause mismatches are also removed from the DSM.
Subsequently, inaccurate ground points around buildings and trees are removed. In
Figure 6a, profile comparisons of DSMs from LiDAR and stereo images are presented for a building in Guangzhou. The red lines represent the DSM from stereo images, and the black represents the DSM from LiDAR. In the ground pointed by the arrow, the DSM from the stereo image is higher than the DSM from LiDAR. These points should be removed from the ground filter.
Figure 6b illustrates the method for identifying inaccurate ground points. For each window near the building, we calculated the elevation change along four lines. If
, the points on this line are considered as inaccurate points.
Figure 6c shows a partial multispectral image of Guangzhou,
Figure 6d shows the removed points in this image. This process ensures that the elevation of the occluded ground is estimated from nearby ground.
Finally, the progressive TIN densification algorithm [
42] is employed to filter the ground points in the DSM.
Figure 6e is the input DSM, and
Figure 6f is the generated DEM. The mean elevation around the buildings in the DEM is used as the ground elevation, denoted as
.
2.6. Segmentation of Building Rooftop Contours Containing Multiple Elevations
The material of the podium building and building side is similar to that of the main building rooftop, making it difficult to distinguish them in remote sensing images. Consequently, some building rooftop contours in input data encompass the podium building and building side. To address this problem, differences in building contour between forward and backward images are utilized to segment these building rooftop contours. The algorithm process is as follows:
Algorithm 3. Building rooftop contour segmentation process |
Input: Epipolar images and , building rooftop contour , matched building edge sets and , contour matching degrees and . Output: Building rooftop contours , . Identify building contours that need to be segmented based on , , , . Extract samples of the main building rooftop and samples of the podium building rooftop using and . Utilize clustering algorithms to classify pixels in and obtain the main building rooftop using the extracted samples. Podium building rooftop . Apply Algorithm 2 to . Classify as podium building or building side.
|
In contour matching, a matched building edge has a long enough parallel line in the image. We proposed a method to identify matched edges. We divide the building contour template into multiple subsets based on the edges in the building rooftop contours. For each subset, the total number of
is denoted as
. For each
within the subset, the distance between
and
is calculated. To distinguish points inside the building contour from points outside the building contour, the distance of the point inside the building contour is set to a negative value. Considering that the lines in the image have dimensions, the distance intervals [
,
], [
,], …, [
,] are used to represent the parallel lines. The
represents the width of the parallel line and is set to 2 pixels. If the distance between
and
belongs to any internal,
belongs to this parallel line. The parallel line with the most contour points is the longest, denoting this contour point number as
. When
, the edge is considered as a matched edge. Set
to represent matched edges set in the backward image, and
to represent matched edges set in the forward image.
Figure 7 shows two building rooftop contours and the corresponding
,
.
As shown in
Figure 7, the matched edges are different in the forward and backward images. Due to the tilt angles, the building sides in the backward image are occluded in the forward images. Additionally, the relative location between the podium building and the main building has changed. The differences between
and
provide samples for building contour segmentation. Define the set of edges
, where the edges in
belong to the main building rooftop. Define the set of edges
, where the edges in
belong to the podium building rooftop. By buffering
and intersecting it with the building contours, the samples of the main building are obtained. Similarly, applying these operations to
provides samples of the podium building. In
Figure 8a, the red edges represent
, and the blue edges represent
. Meanwhile,
Figure 8b shows samples of the main building rooftop, and
Figure 8c shows samples of the podium building.
The pixels within the building rooftop in the forward images are classified into main building pixels and podium building pixels based on their grayscale. The K-means clustering algorithm is employed to group these pixels into eight clusters. For each cluster, the numbers of pixels in main building samples and podium building samples are counted, separately. If the number of pixels in the main building samples exceeds those within the podium building samples, this cluster is considered as a part of the main building rooftop. The resulting main building rooftop from this process is depicted in
Figure 8d. Due to the limitations of panchromatic images, pixels with the same grayscale as the main building rooftop are misclassified. To address this issue, the parts overlapping with the samples of the main building are preserved, illustrated in
Figure 8e. Thereafter, the longest edge in the original building contour is found to assist in gap filling. For each pixel outside the main building rooftop, parallel and perpendicular lines of the longest edge are drawn. If both ends of the parallel or perpendicular lines intersect with the main building rooftop, the pixel is considered part of the main building rooftop. We denote the main building rooftop as
, while the remaining building rooftop is a podium building, denoted as
.
Figure 8f shows the classification result, where the red area represents
, and the blue area represents
.
For podium building rooftop
, the contour matching algorithm is executed.
is identified as a podium building when a building rooftop is matched in the forward image. Otherwise, it is considered as occluded building sides. Following Zhang’s algorithm [
26] as a reference, this paper conducted building contour segmentation experiments in Xi’an.
Figure 9 shows the partial results of the building contour segmentation.
3. Results
3.1. Data Description and Experimental Area
This paper selected three regions—Yingde and Guangzhou in Guangdong Province, and Xi’an in Shaanxi Province—as experimental areas for the algorithm. Their basic details are as follows:
As for the Yingde experimental area, the GF-7 image was captured on 11 October 2020. The center coordinates of the backward image were 113.409°E and 24.326°N, with solar zenith and azimuth angles of 33.466° and 158.717°, respectively. A total of 841 building footprints within this experimental area were acquired. The images and the building footprints of the Yingde experimental area are shown in
Figure 10. The DSM used in the experiments was computed using He et al.’s algorithm [
32]. LiDAR data from the experimental area were collected as the reference for building heights.
Figure 11 displays the DSM obtained from the LiDAR data and the DSM generated from the stereo images.
In the Guangzhou experimental area, the GF-7 image was captured on 14 March 2020. The center coordinates of the backward image were 113.329°E and 23.137°N, with solar zenith and azimuth angles of 32.013° and 140.211°, respectively, as shown in
Figure 12. A total of 89,093 building rooftop contours were extracted from the backward image by a building extraction algorithm. The DSM utilized in the experiments was derived using He et al.’s algorithm [
32]. LiDAR data from this region served as the reference for building heights.
Figure 13 illustrates a portion of the extracted building rooftop contours, the DSM obtained from LiDAR data, and the DSM generated from stereo images.
In the Xi’an experimental area, we utilized the dataset provided by Zhang et al. [
26] The GF-7 image was captured on 17 February 2020, with the center coordinates of the backward image at 108.951°E and 34.255°N, having solar zenith and azimuth angles of 50.029° and 154.657°, respectively. The Xi’an experimental area encompasses the tallest building in Xi’an (350 m) and its surrounding areas. A total of 34 building rooftop contours were manually marked in the backward image, and reference building heights were obtained through manual marking of corresponding points. The DSM used in the experiments was calculated using He et al.’s algorithm [
32].
Figure 14 illustrates the images, building rooftop contours, and the DSM generated from stereo images.
In the Yingde and Guangzhou experimental areas, the reference building heights were calculated according to the vertical distance of ground around the building to the rooftop surface using LiDAR data. However, the production times of the LiDAR data and the GF-7 image were different, which led to different buildings in these data. To ensure the accuracy of the reference building heights in precision assessment, hundreds of buildings were randomly selected and manually removed the building that had discrepancies between the GF-7 images and the LiDAR data. In the Yingde and Guangzhou experimental areas, 343 and 506 buildings were obtained for precision assessment, respectively.
The buildings in the three experimental areas exhibit distinct characteristics that can validate our algorithm in different cases.
Figure 15 illustrates the distribution of reference building heights: most buildings in Yingde are below 20 m, while in Guangzhou, the majority of building heights fall within the range of 20 to 100 m, and in Xi’an, half of the buildings are over 100 m. Additionally, the challenges related to contour matching differ across these study areas. In Xi’an, accurate building contours marked by humans are easy to match. Conversely, in Yingde, the building rooftops of adjacent footprints may overlap in images, as depicted in
Figure 16a. In Guangzhou, the contour matching suffers from unclear edges, as depicted in
Figure 16b.
3.2. Evaluation Metrics
This paper evaluates the algorithm’s accuracy by comparing the extracted building heights with the reference building heights. Mean error (ME), mean absolute error (MAE), and root mean square error (RMSE) were chosen as the evaluation metrics in this paper. They are calculated as follows:
In the equations, represents the extracted building height, while denotes the reference building height.
Due to the building samples used in the experiments, significance testing is conducted to assess whether differences in experiment results are statistically meaningful or could have occurred by chance alone. The t-test was employed to compare the MAEs of two experimental groups. The null hypothesis and alternative hypothesis of the t-test are detailed in the notes following the table.
3.3. Performance of Building Height Extraction
The evaluation result is shown in
Figure 17. The MAE and RMSE for each group are calculated and presented in
Table 1 below. The right-tailed, two-sample
t-test was conducted to compare the MAEs. The results of the
t-test are summarized in
Table 2. Additionally,
Figure 18 displays the 3D reconstruction models of buildings. According to the statistical results and significance testing, our algorithm performed worst in Guangzhou and best in Xi’an.
Our algorithm was implemented in C++ and ran on a desktop computer with an Intel Core i5-6500 processor clocked at 3.20 GHz, featuring four cores and four threads. The algorithm utilized OpenMP for parallelization to leverage multi-core processing capabilities. In Guangzhou’s experimental areas, contour matching processed 89,093 buildings in a total time of 11,191 s, while ground filtering processed the DSM with dimensions of 34,613 × 38,824 in a total time of 14,041 s.
3.4. Comparative Experiment
The building height extraction methods based on the GF-7 satellite image chosen for the comparison experiments are as follows:
- (1)
The first comparison calculates building heights using the maximum and minimum elevations within the DSM within the building buffer zone [
35], hereafter referred to as the ‘DSM method’.
- (2)
In the second comparison, the ground elevation around the building is extracted by our algorithm, and the building rooftop elevation is extracted using the maximum elevations within DSM elevations within the building buffer zone, hereafter referred to as the ‘DSM + DEM method’.
- (3)
Wang et al.’s method [
30] was chosen as the third comparison, hereafter referred to as the ‘nDSM method’.
- (4)
Zhang et al.’s method [
26] was compared with ours, hereafter referred to as ‘Zhang’s method’.
Table 3 summarizes the accuracy of the comparative experiment. As Zhang’s algorithm cannot use building footprints as input data, we cite their experimental results in Xi’an [
26] for comparison with ours. The right-tailed, two-sample
t-test was conducted to compare the MAE of these methods. The results are summarized in
Table 4. ME was used to reflect the distribution of errors in this comparative experiment, and the one-sample
t-test was conducted to test whether errors followed a normal distribution with a mean of zero.
Table 5 shows the result of the one-sample
t-test.
Figure 19 showcases the distribution of errors in building height extraction. The statistical analysis demonstrated that the building height extraction accuracy achieved by our algorithm outperformed comparative methods across all three study areas. The significance testing in
Table 5 shows that the error distribution of the DSM method and DSM + DEM method did not have a mean equal to zero. This means that the building height extracted by these methods was higher than it actually was.
To make a comparison with Zhang’s method, the t-test for a hypothesized mean was conducted. The null hypothesis states that the absolute errors of our method came from a distribution with a mean of 1.69 m. The t-test yielded a t-value of −0.6928 with a corresponding p-value of 0.4933. This means that Zhang’s method demonstrated comparable accuracy to our algorithm in the Xi’an experimental area. However, our method can utilize building footprints as input data, making it more versatile in its application.
3.5. Ablation Experiment
To improve the performance of contour matching, this paper proposes two improvements: contour template correction based on the edges extracted on a backward image and local maximum values filtering by the DSM. The effect of improvements was examined in the ablation experiment. The following algorithms were used in ablation experiments:
- (1)
Conventional contour matching algorithm [
39], hereafter referred to as the ‘CM-C’.
- (2)
Contour matching algorithm with contour template correction based on the edges extracted on backward image, hereafter referred to as the ‘CM-I’.
- (3)
Contour matching algorithm with local maximum values filtering by the DSM, hereafter referred to as the ‘CM-D’.
In Yingde, the contour matching algorithm for the building footprint only includes the module that local maximum values filtering. Therefore, CM-C was performed for the ablation experiment. In Guangzhou, all methods were used for the ablation experiment. In Xi’an, due to the high precision of the building rooftop, there was no mismatch in the conventional contour matching method. Therefore, no ablation experiment was conducted.
According to the three-sigma rule of thumb, the thresholds for identifying mismatches were computed using the errors of our method.
Table 6 presents the thresholds and the counts of matched buildings and mismatch.
Figure 20 illustrates the distribution of absolute error in building heights. The experimental results demonstrate that our improvement can effectively reduce mismatches.
5. Conclusions
This paper proposes a method for extracting building heights from high-resolution GF-7 stereo imagery. The method employs contour matching techniques to enhance building rooftop elevation extraction. Within the contour matching process, the method filters local maximum values by a DSM to resolve the mismatch issue. Moreover, the contour template correction is used to ensure higher precision in cases of unclear building edges. To improve the accuracy of the ground elevation extraction around the building, this method utilized image classification from the GF-7 multispectral imagery to identify and remove error-prone regions within the DSM, aiming to enhance the accuracy of ground filtering. The proposed method was validated in Yingde, Guangzhou, and Xi’an, showcasing its performance against comparative algorithms. The proposed method has more advantages for high-rise buildings. In the rooftop elevation extraction, the proposed algorithm takes the rooftop as an object, unaffected by issues such as smooth transitions in the DSM and rooftop appendages affecting the rooftop, resulting in more accurate results. In the ground elevation extraction, the proposed method effectively removes non-ground points and inaccurate ground points from the DSM, yielding accurate results in flat terrain.
However, problems such as unclear building edges and occluded undulating terrain are still challenges in building height extraction. In future research, semantic segmentation for identifying building edges and other data sources for ground elevation estimation can be considered to improve the accuracies of the elevation of rooftop and the ground elevation. Additionally, different satellite images from different cities, different countries, and even climate zones can be used to validate and improve the proposed methods.