1. Introduction
Global urbanization poses a considerable challenge to the goal of sustainable urban development. According to recent estimates of the United Nations [
1], about 70% of the world’s population will live in cities by 2050. By that time, it is expected that 90% of population growth, 80% of the increase in wealth and approximately 60% of energy consumption will occur in urban areas. This unprecedented rise of cities and urbanized areas demands for efforts to be made in the integration of new strategies that can help to achieve economic growth as well as social equality and stability, while protecting the environment and reducing the effects and impacts of climate change [
2].
To this end, understanding and monitoring the changes in the size and composition of the built environment has become a central element of political frameworks. Here, up-to-date and globally consistent data on the status and characteristics of the built-up area is at the core of national and local initiatives focusing on sustainable development [
3,
4,
5,
6]. However, due to the dynamic and rapid nature of global urbanization, such data (and derived empirical evidence) are a resource which is still rare. Here, earth observation (EO) has already been successfully employed to globally identify human settlements and to delineate their size in term of their horizontal extent. Different authors, for instance, introduce various methodologies for the extraction of human settlements, describing the built-area as urban/rural, built-up/non-built-up, or in terms of the population counts per settlement area [
7,
8,
9,
10,
11,
12]. Moreover, human settlements are further characterized in relation to the percentage of impervious surfaces within the built-up area, allowing—to a certain degree—for an assessment of the built-up density and its corresponding population [
13,
14,
15,
16].
While the datasets resulting from these approaches have been used in a wide range of applications, remaining limitations still arise from the fact that they only represent a two-dimensional (2D) mapping of the settlement characteristics in the horizontal dimension. In other words, any detailed and effective examinations of the built-up environment in terms of the volume or floor space (as the optimal measure for built-up density), urban morphology, or population distribution, inevitably require the consideration of the vertical dimension. In this context, precise studies on the three-dimensional (3D) extent of urban structures at the level of single cities have been frequently reported in recent years, with many approaches relying on digital elevation data derived from very high-resolution aerial and satellite imagery or airborne light detection and ranging (LiDAR) data [
16,
17,
18,
19,
20,
21]. Another interesting approach is Synthetic Aperture Radar (SAR) tomography which uses data from spaceborne imaging radar systems to reconstruct the location and height of buildings at a very high spatial resolution [
22]. However, so far this method has only been applied in studies at a city or regional scale, mostly due to the quite comprehensive data stacks required.
First approaches suitable of measuring the status and development of vertical expansion at a large-scale have been introduced by Frolking et al. [
23] and Mahendra and Seto [
6]. In order to quantify the outward and upward expansion of cities worldwide, both studies analyze data on the backscattering intensity of the built-up environment derived from NASA’s SeaWinds microwave scatterometer in combination with information from nightlight data and the Global Human Settlement Layer, respectively. Similarly, Mathews et al. [
24] estimated the built-up volume for selected cities at a 1km resolution based on scatterometer data collected by QuikSCAT. While promising, these approaches still suffer limitations arising from the comparably low spatial resolution of the scatterometer data and from the fact that variations in the backscattering intensity can result from a multitude of effects in addition to the size and vertical growth of buildings.
Alternative large-scale approaches follow the already established standard idea of defining the local 3D built-up structure via a normalized digital surface model (nDSM), which is calculated by subtracting a modeled digital terrain model (DTM) from the original digital surface model (DSM). Marconcini et al. [
25] demonstrated an approach to derive building heights at a spatial resolution of ~100 m with morphological operations that are applied to globally available DSMs such as the 12 m TanDEM-X digital elevation model (TDX-DEM). The same method was later employed by Clinton et al. [
26] to conduct a worldwide estimation of built-up heights from the 30 m data of the AW3D30 DSM. Geiß et al. [
27] used the TDX-DEM elevation data in combination with Sentinel-2 imagery to estimate the built-up height and density for selected urban areas. Recently, Li et al. [
28] presented an approach for the mapping and analysis of 3D building structures based on random forest models applied to a large collection of earth observation data to produce maps of 3D building structures for China, Europe and the United States.
However, none of the approaches developed so far have yet been used to generate a spatially detailed map (<30 m spatial resolution) of the small-scale 3D building structures at a continental or global scale. Hence, this paper presents a processing and analytics concept aiming for a high-resolution large-scale 3D mapping of human settlements. The approach is based on a joint analysis of data collected by the German TanDEM-X satellite mission, namely the TanDEM-X digital elevation model (TDX-DEM) and the underlying SAR amplitude images (TDM-AMP). In addition, multispectral Sentinel-2 (S2) imagery, Open Street Map data (OSM), and the Global Urban Footprint (GUF) settlement mask are considered in this multi-source data analytics procedure.
Section 2 of this manuscript first outlines the input data sources, followed by a detailed description of the main components of the processing framework for urban 3D mapping. This includes automated workflows for (i) the calculation of an nDSM, (ii) the generation of a building mask, and (iii) the assignment of heights to the building structures provided by the building mask.
Section 3 then specifies the basic properties of the resulting Global Urban Footprint 3D (GUF-3D) data product along with the first results of a validation exercise conducted on the basis of reference data collected for the cities of Amsterdam (NL), Indianapolis (US), Kigali (RW), Munich (DE), New York (US), Vienna (AT), and Washington (US). Finally, the conclusions are drawn and an outlook on the next developments is provided in
Section 4.
3. Results
In this section, a brief product specification of the GUF-3D processing outcome is given along with a first qualitative and quantitative characterization of the GUF-3D product based on a systematic validation against ground truth data. In general, the output of the GUF-3D processing is provided as a 16-bit signed integer, Lempel-Ziv-Welch (LZW) compressed GeoTiff file in a geometric resolution of 0.4 arcsec (~12 m) and geographic coordinates (lat, lon) as the projection. The values of the GUF-3D product represent the local relative height above the ground in meters for all the pixels indicated as a building by the corresponding BM. No data regions (all areas outside the mask) are assigned the value −32,767.
Figure 2 shows the input data, intermediate layers and final result of the GUF-3D processing for the city of Munich, starting with the original TDM-DEM (
Figure 2a), then the 12 m nDSM (
Figure 2b), the 90 m aggregation of the nDSM (
Figure 2c), and the final GUF-3D product indicating the building height estimated for all the building structures (indicated by the BM) at 12 m resolution (
Figure 2d).
Figure 3 illustrates the GUF-3D layer generated for the reference city Indianapolis (US) in comparison to the 3D information provided by the corresponding ground truth data.
In the following, the results of a first validation campaign for the GUF-3D product are presented for seven cities: Amsterdam (NL), Indianapolis (US), Kigali (RW), Munich (DE), New York (US), Vienna (AT), and Washington (US). The selection of the cities was driven by the availability of accurate reference data on building heights and footprints that had been collected as close in time as possible to the TDX-DEM collection in 2012 (
Appendix A). Here it is important to note that the presented work primarily represents a technical feasibility study that aims to indicate the effectiveness and methodological robustness of the described approach to generate urban 3D data at large-scale. A globally more representative and complete validation of the product itself will follow as soon as the GUF-3D data will be available area-wide at a continental or even global scale.
For most cities, the height raster datasets were derived from very high-resolution (VHR) nDSM models, produced based on LiDAR data. On the other hand, building footprints (vector format) were obtained from local cadastral offices (Amsterdam, Munich and Vienna) and the Microsoft building footprints dataset (Indianapolis, New York and Washington) [
39]. For the particular case of Kigali, the height data and building footprints were derived from aerial images and VHR Pleiades imagery introduced by Bachofer et al. [
40], and this was shared for the study at hand by this group. To produce the final reference 12 m building height layers for each city, the average height was extracted from each VHR height dataset using the corresponding building footprints, which were then rasterized at a 12 m resolution. In total, the area covered by the reference dataset sums up to 3598.6 km
2. Generally, it is important to note that for the validation each reference dataset was randomly split into one portion (one-third of the total) used for the empirical determination of thresholds and parameter settings in the context of the system design and parameter tuning, while the remaining two-thirds of the reference data were used for the validation of the final GUF-3D. Therefore, the data was virtually gridded into 500 * 500 pixel cells which were randomly and exclusively assigned in the defined proportions to either the data pool for parameter testing or the one for validation.
When assessing the accuracy of the GUF-3D, two basic components have to be addressed. First, the quality of the extracted building masks (BMs) which is assessed (1) by estimating the accuracy of the identified building pixels in each BM, and (2) by comparing the total building area per city provided by each BM in comparison to the reference data. Secondly, the precision of the estimated heights for the buildings is determined by comparing the height allocated to each BM pixel to its corresponding height in the reference data. Moreover, the dependency between the height estimation errors and the actual heights of the buildings was assessed at the level of building objects.
As detailed in
Section 2.2.2, the design of the process for the BM generation provides the flexibility to consider different constellations for the compilation of the final structures, to which the heights formed from the nDSM analysis are assigned. For this research, the BM validation includes the assessment and cross-comparison of the two possible constellations for the building mask: first, the situation that OSM data is available and the BM is thus composed of the components of BM
OSM, BM
AMP and BM
nDSM. Secondly, the BM
NoOSM constellation is representative for all regions where no OSM data is available at all, meaning that the final building mask (BM) is solely composed of BM
AMP and BM
nDSM (here it is important to note that in this version also the ground point detection procedure was run without the consideration of EC
1, as defined in
Table 1).
The quality and accuracy of the BM and BMNoOSM, respectively, are defined in comparison to the building footprints provided by the reference data. Here, it is important to note that occasionally OSM building polygons also represent the basis for the reference building footprints. Hence, the BM can be expected to be almost identical to the reference mask in these cases or regions. Nonetheless, in those regions lacking any OSM building information, the quality of each BM will solely be defined by the accuracy of the structures provided by BMNoOSM (combination of BMAMP and BMnDSM).
Table 2 presents a summary of the results in the form of a confusion matrix calculated at the pixel level based on the binary building layers (1 = building pixel, 0 = no building pixel) of the BM and the corresponding reference building footprint representations. In terms of building identification, results show that the producer’s accuracy (how well the reference pixels are identified in each BM) ranges between 54.04 and 100% for the BM and between 57.34 and 72.95% for BM
NoOSM. The user’s accuracy (the probability of a given building pixel in each BM to be found on the reference) ranges between 53.28% and 88.87% for the BM and between 37.09 and 54.79% for BM
NoOSM. Additionally, the overall accuracies of the BM range between 96.78% for Amsterdam and 76.95% for Kigali, whereas the ones for BM
NoOSM has values between 82.48% for Indianapolis and 70.28% for New York. Generally, the accuracies are substantially influenced by the proportion of BM
NoOSM, with error rates increasing the higher the relative proportion of BM
NoOSM to BM
OSM is with respect to the BM.
In addition,
Figure 4 shows the relative deviation of the BM and BM
NoOSM, respectively, in the form of the percentage error between the estimated total building area per city (bar plots) compared to the actual building area derived from the reference data (red lines).
Besides the quality of the building mask, the second key aspect determining the GUF-3D accuracy is the precision of the nDSM modeling and the related height assignment procedure for the identified buildings assigned in the BM and BM
NoOSM. The results of a pixel-related accuracy assessment based on the comparison between the estimated building heights (BH and BH
NoOSM) and the corresponding values given by the reference data for all the studied cities are illustrated in
Figure 5. In terms of the BH (heights estimated for building structures given by the BM), the results show that the mean error (ME) between the estimated heights and the real heights ranges from 0.01 m for Munich to 3.32 m for Amsterdam, indicating a tendency towards a systematic overestimation of the height values at the pixel level. Accordingly, the ME values range from −1.06 m for Indianapolis to −5.96 m for Vienna, based on BH
NoOSM. Here, however, results show a systematic underestimation of the height values. The reason for this underestimation is the considerable overestimation of the real building area already reported in the context of the BM validation. This leads to a disaggregation of the collected 12 m heights aggregated at the 90 m cell into too many building pixels. Consequently, the assigned heights are (proportionally at the 12 m level) too low.
In addition to the ME, the mean absolute error (MAE) was calculated in order to develop a better understanding of the local distribution and range of height estimation errors. On the one hand, the BH reports its lowest MAE value of 2.28 m for Indianapolis and its highest MAE value of 4.89 m for Washington. On the other hand, BHNoOSM reports its lowest MAE value of 2.21 m Indianapolis and its highest MAE value of 6.46 m for Vienna.
While
Table 2,
Figure 4 and
Figure 5 present the accuracy results of the GUF-3D in terms of two different components, namely the quality of the BM and the local height estimation BH at the pixel level, a complete interpretation of the results has to be derived simultaneously from both assessments. Due to the incorporation of BM
OSM, which is comparable to many of the reference building footprints (see results in
Table 2), the reported percentage errors in terms of the building area are lower for the BM in comparison to BM
NoOSM (see
Figure 4). These lower percentage errors translate into overall more accurate building heights (lower ME and MAE values of the BH), as the final 12 m height values are more accurate due to the lower values for the percentage built-up coverage (BCA). Accordingly, with the large building area overestimations reported for BM
NoOSM, the ME and MAE values reported by BH
NoOSM are larger as well, due to the influence of higher BCA values in the final 12 m height values. Here, for cities such as Indianapolis and Kigali, where the BM
NoOSM presented the lower overestimations of the building area, the ME and MAE values are also lower in comparison to the rest of the countries, which confirms the correlation between both components.
An additional analysis was carried out to investigate the dependency between the actual height of the buildings and the amount of estimation errors at the building level. Here, the average height of each reference building footprint was extracted from the reference data and each BM of the GUF-3D. The average height was then converted into the number of floors, considering a standard height of 3 m for a typical floor. From here, the mean error ME was calculated between the estimated number of floors derived by the two BM versions and the reference data.
Figure 6 shows the ME values reported by each BM according to the number of floors for each study city.
Results indicate a systematic increase of error, with higher underestimation errors from the low- to high-rise buildings. This behavior can mainly be attributed to the layover effect, where the actual pixels belonging to high buildings are mapped closer to the satellites footprint and therewith positioned outside the actual location of the building footprint. In other words, as the height of the building increases, the average height of the building footprints is calculated on the basis of the backscatter coming from the lower parts of the building, leading to increasing errors of underestimations with increasing building heights. At the same time, the accurate representation of building footprints and forms might be limited due to the additional SAR-specific effects such as multiple scattering or shadowing.
Finally, in order to better assess the effectiveness of each BM, the distribution of the ME was aggregated over all the studied cities as shown in
Figure 7. The results indicate that the ME values derived from the BH are slightly lower than those reported by BH
NoOSM. Here, the interquartile range, namely between the 25% and 75% quartiles, and the ME values are reported by the BH range from minus two to one floor(s) for buildings between one and five floors in height, minus six to one floor(s) for buildings between six and ten floors height, and minus eighteen to minus three floors for buildings higher than ten floors. Here, however, it is worth noting that from the 2,077,215 reference buildings considered for this study, 99.37% are buildings with a height between one and five floors, 0.5% are buildings with six to ten floors, and only 0.13% are buildings higher than 10 floors. Hence, the largest errors of underestimation are only present for a very small number of buildings.
4. Discussion
Considering the analyses and (relative) comparisons at the city level, the obtained results suggest the ability of the developed approach to provide accurate information on the absolute and relative differences in the urban morphology. Focusing on scenarios where OSM data is available, the reported mean error (ME) in comparison to the reference data ranges between 0.01 m and 3.32 m at the city level. The local absolute errors lie between 2.28 m and 5.73 m. The GUF-3D error statistics also indicate a basic suitability of the product to examine the 3D urban properties at the level of individual buildings, with mean errors ranging between minus two and one floor(s) for buildings up to five stories high. Moreover, the results reveal the increasing deviations between the estimated and true building heights from the low- to high-rise buildings.
In general, the analyses have shown that the representation of the detailed 3D building pattern and related morphological structuring significantly profits from the availability of building outlines provided by OSM. If only the 12 m BM
AMP and BM
nDSM data are available (BM
NoOSM scenario), the ability to accurately delineate individual buildings is limited. Notably, the typical side-by-side arrangement of small single houses with surrounding trees or hedges in residential areas frequently leads to an entire (virtual) row of houses and interlaced trees being delineated as one elongated building structure. Due to the typical small-scale dimension of the vegetation features, the NDVI-based vegetation masking cannot help to avoid this effect. However, the related errors and ambiguities have been quantified by the reported systematic average overestimation of the building footprints (
Figure 4) in case of a total lack of OSM building data. As an approach to (numerically) correct this effect, the built-up coverage area (Equation (12)) and area adaptation factor
(Equation (9)) were introduced for the optimized estimation of the building heights (BHs) and the building volumes (BVs).
Apart from the presented numerical validation based on ground truth information, the GUF-3D was generated for >100 additional cities and settlement areas distributed all over the world in order to provide complementary qualitative insights into the types and distribution of errors in a varying environmental context (e.g., urban structural, architectural cultural settings, topography, geomorphology, climate, etc.).
Figure 8 illustrates various examples of these additional GUF-3D products, including the cities of Brasilia (BR), Cape Town (ZA), Tokyo (JP), and Mexico City (MX).
Thereby, the positive observations reported in the context of the quantitative validation campaign (see
Section 3) could generally be confirmed. However, the related visual inspections also showed—though locally limited—difficulties of the presented approach to accurately estimate the building height in certain very high-density urban areas (e.g., informal settlements, old city cores with high buildings and very narrow streets) where, under the intrinsic constraints of the SAR-related side-looking geometry, a spatially consistent visibility of the ground surface (e.g., radar shadow) is not given. Moreover, the quality of the TDX-DEM itself is occasionally poor in these areas as well due to phase ambiguities resulting from multiple scattering, and the under sampling of the 12 m radar intensity images with respect to the small-scale urban structures. Consequently, the height of the built-up structures in these regions is underestimated because of a significant amount of estimated ground points being falsely positioned on building structures that show a lower height compared to their neighbors.
Inaccuracies could also be observed in built environments with very extreme topographic situations (e.g., almost continuous building coverage in very rugged terrain). Here, the achievable density and distribution of ground points is sometimes, even under optimal performances, too low to allow for a complete and accurate representation of the local peaks or sinks in the terrain. As a result, these topographic structures—and the buildings on top of them—are virtually cut by the terrain interpolation procedure. Consequently, the buildings in the direct vicinity suffer from false height assignments in the form of a drastic overestimation, in the case of terrain peaks that were cut, or severe underestimations for houses in sinks that were virtually filled. A few difficulties were also discovered in the case of large urban quarters entirely consisting of extremely high buildings, where the top of the buildings is shifted by multiple pixels so that their according height value might even lie outside the aggregated 90 m cells. Finally, in rural settlements with predominating one-story houses, the determined heights of buildings are sometimes overestimated locally due to the influence of single or small groups of high trees that push the local nDSM heights upwards, but cannot be eliminated by the vegetation mask due to a lack of spectral significance and clarity (mixed pixels) which would allow a clear separation from the surrounding built-up structures in the NDVI layer.
Generally, a test-wise application of the method to a large variety of heterogeneous city and settlement types located all over the globe showed that the key information which is analyzed— namely, the characteristics of the local height variations in the TDX-DEM—is a rather robust and stable feature all around the world. Nevertheless, in an upcoming comprehensive validation campaign, all the uncertainties of the proposed methodology will be systematically identified and quantitatively and qualitatively analyzed in order to define the technical improvements of the approach if required. This campaign will also be used to underpin and supplement the representativeness and statistical significance of the statistical findings presented in this study (e.g., an extended and more balanced coverage of reference sites for all the key urban and rural settlement types in the major cultural/geographic zones on Earth).
5. Conclusions and Outlook
In this paper, the Global Urban Footprint 3D (GUF-3D) processing framework was introduced and validated for seven city regions. This modular workflow includes the rule-based analysis of TanDEM-X digital elevation (TDX-DEM) and radar amplitude data (TDX-AMP) in combination with auxiliary layers such as building outlines derived from Open Street Map (OSM), a vegetation mask generated based on Sentinel-2 imagery, and a global map of human settlements defined by the Global Urban Footprint (GUF). The resulting GUF-3D product defines the relative local heights of all building structures within a settlement at 12 m spatial resolution.
The outcomes of a first validation campaign based on the absolute reference data collected for seven globally distributed cities indicate a high potential of the new method to effectively map the vertical extent of the built-up structures within human settlements. This observation is confirmed by an additional 100 (and more) spots located on all the continents and covering the most important urban types for which the GUF-3D was generated as well. The related qualitative inspections show that the typical characteristics of local height variations in the built environment represent a geographically stable and distinct feature which can effectively be analyzed based on the 12 m TDX-DEM. However, whereas the approach allows for an efficient description of the urban morphology at city level, the potential related to a precise height estimation for individual buildings is still limited, especially in the case of a lack of precise information on the buildings outlines (e.g., such as provided by OSM). In this context, the study documented the local variations of height estimation errors for specific complex settings such as very high buildings (underestimation of building heights), small one-story houses surrounded by significantly higher vegetation (overestimation of building heights), very high-density urban areas with quite narrow streets and/or no open (ground) space in between (underestimation of building height), or settlements built on very rugged terrain (over- and underestimation of building heights). Nevertheless, the general dimension and effects of these limitations have been reported and quantified in the Results section and are comprehensively discussed in the Discussion section.
To conclude, the presented method is supposed to pave the way for the generation of large-scale GUF-3D datasets that can be expected to considerably push studies related to refined large-scale analyses of the vertical built-up extent, built-up densities, urban morphological properties, and volumetric characteristics (e.g., floor space index). These parameters are key inputs to improve the modeling of population distributions, urban climates and carbon emissions, economic variables, or vulnerabilities and risks. In the context of the Corona/Covid-19 pandemic, the GUF-3D technique has recently been used in combination with the WSF 2015 settlement mask [
12] by the World Bank [
41] to predict contagion risk hotspots in several African, Asian and South American cities.
Once more, GUF-3D datasets have been processed and validated, and an open and free provision is foreseen, for instance via German Aerospace Center’s (DLR) Earth Observation Center (EOC) Geoservice (
https://geoservice.dlr.de) and the Urban Thematic Exploitation Platform [
42] (
https://urban-tep.eu).