Pixel- vs. Object-Based Landsat 8 Data Classification in Google Earth Engine Using Random Forest: The Case Study of Maiella National Park

Tassi, Andrea; Gigante, Daniela; Modica, Giuseppe; Di Martino, Luciano; Vizzari, Marco

doi:10.3390/rs13122299

Open AccessArticle

Pixel- vs. Object-Based Landsat 8 Data Classification in Google Earth Engine Using Random Forest: The Case Study of Maiella National Park

by

Andrea Tassi

¹,

Daniela Gigante

¹,

Giuseppe Modica

²

,

Luciano Di Martino

³

and

Marco Vizzari

^1,*

¹

Department of Agricultural, Food, and Environmental Sciences, University of Perugia, 06121 Perugia, Italy

²

Dipartimento di Agraria, Università degli Studi Mediterranea di Reggio Calabria, Località Feo di Vito, 89122 Reggio Calabria, Italy

³

Maiella National Park, Via Badia 28, 67039 Sulmona, Italy

^*

Author to whom correspondence should be addressed.

Remote Sens. 2021, 13(12), 2299; https://doi.org/10.3390/rs13122299

Submission received: 28 April 2021 / Revised: 8 June 2021 / Accepted: 9 June 2021 / Published: 11 June 2021

(This article belongs to the Special Issue Machine Learning Using Medium and High-Resolution Remote Sensing Datasets)

Download

Browse Figures

Versions Notes

Abstract

:

With the general objective of producing a 2018–2020 Land Use/Land Cover (LULC) map of the Maiella National Park (central Italy), useful for a future long-term LULC change analysis, this research aimed to develop a Landsat 8 (L8) data composition and classification process using Google Earth Engine (GEE). In this process, we compared two pixel-based (PB) and two object-based (OB) approaches, assessing the advantages of integrating the textural information in the PB approach. Moreover, we tested the possibility of using the L8 panchromatic band to improve the segmentation step and the object’s textural analysis of the OB approach and produce a 15-m resolution LULC map. After selecting the best time window of the year to compose the base data cube, we applied a cloud-filtering and a topography-correction process on the 32 available L8 surface reflectance images. On this basis, we calculated five spectral indices, some of them on an interannual basis, to account for vegetation seasonality. We added an elevation, an aspect, a slope layer, and the 2018 CORINE Land Cover classification layer to improve the available information. We applied the Gray-Level Co-Occurrence Matrix (GLCM) algorithm to calculate the image’s textural information and, in the OB approaches, the Simple Non-Iterative Clustering (SNIC) algorithm for the image segmentation step. We performed an initial RF optimization process finding the optimal number of decision trees through out-of-bag error analysis. We randomly distributed 1200 ground truth points and used 70% to train the RF classifier and 30% for the validation phase. This subdivision was randomly and recursively redefined to evaluate the performance of the tested approaches more robustly. The OB approaches performed better than the PB ones when using the 15 m L8 panchromatic band, while the addition of textural information did not improve the PB approach. Using the panchromatic band within an OB approach, we produced a detailed, 15-m resolution LULC map of the study area.

Keywords:

Google Earth Engine (GEE); land use land cover (LULC); Landsat 8; object-oriented classification; machine learning (ML); Random Forest (RF); SNIC (Simple Non-Iterative Clustering); GLCM (Gray Level Co-occurrence Matrix); pan-sharpening

Graphical Abstract

1. Introduction

Land use land cover (LULC) maps represent an essential tool in managing natural resources and are a crucial component supporting strategies for monitoring environmental changes [1]. Anthropogenic activities in the biosphere are progressively causing profound alterations in the earth’s surface, influencing ecological systems’ integrity and functions [2], their ecosystem services, and the related landscape liveability [2,3]. Changes related to LULC have led to the reduction of various vital resources, like water, soil, and vegetation [4], biodiversity and related ecosystem services [5], and due to rapid, extensive, and intense mutations, natural reserves both locally, regionally, and nationally are in grave danger [6]. Diachronic LULC maps and time-series of remote sensing (RS) data are essential to understand and measure these dynamics across space and time [7,8], also considering landscape gradients [9,10].

In the last decades, the increase of RS data availability, in terms of spectral, spatial, and temporal resolution, combined with the decrease of acquisition and processing costs, has led to better exploitation of such data’s great potentialities to study the earth’s surface. RS data processing has undergone recent improvements thanks to cloud platforms that allow users to access and analyze extensive geospatial information through web-based interfaces instantly. Google Earth Engine (GEE) is currently one of the most used of these platforms, allowing to solve remote sensing data’s primary requirements: storage, composition, processing, and analysis [11]. In GEE, the user can effectively implement many approaches based on state-of-art algorithms for the classification of LULC. Its user-friendly interface and an intuitive JavaScript language make the generated scripts easily reproducible and shareable through the cloud platform. Regarding the base data acquisition and preparation, in GEE, the user can quickly build a multi-temporal filtered dataset, helpful in calculating spectral and statistical indices that improve classifiers’ performance to obtain a more accurate LULC classification [12].

Landsat-8 (L8) Operational Land Imager (OLI), currently the last step of NASA’s Landsat Data Continuity Mission (LCDM), produces RS data that are consistent with those from the earlier Landsat missions in terms of spectral, spatial, and temporal characteristics [13]. Thanks to LCDM, Landsat data are the only medium-resolution dataset that permits LULC change studies over multi-decadal periods. L8 provides multispectral images at a 30-m resolution containing five visible and near-infrared (VNIR) bands and two short-wave infrared (SWIR) bands with a revisit time of 16 days. A 15-m panchromatic band is available as well. In GEE, 30-m L8 bands are currently available at an atmospherically surface reflectance (SR) processing level [14]. In contrast, the panchromatic band (Pan) is only available at the top-of-atmosphere (TOA) reflectance corrected processing level [15]. This band is used in the pan-sharpening (or data fusion) to increase the spatial resolution of the MS bands (30 m) using the higher spatial resolution Pan band (15 m), allowing for smaller LULC features detection [16,17].

LULC classification approaches usually are defined by calculating the classes of interest’s spectral signatures using training data and pixel or object-based discrimination between different land cover types [18]. The related GEographic Object-Based Image Analysis (GEOBIA) has become more popular thanks to delineating and classifying the objects generated at different scales [19]. Object identification is performed in a crucial step based on clustering and segmentation to group similar pixels into image clusters and convert them to vectors [20,21]. The segmentation process has undergone substantial improvement to image classification. After the segmentation, GEOBIA methods generally combine spectral and geometric information with the image’s texture and context information to perform the final object classification [22]. These methods generally produced better results with higher resolution images, although the segmentation step and the subsequent textural analysis require massive computational and storage resources. Tassi and Vizzari [23] proposed an object-based (OB) approach in GEE using machine learning algorithms to produce the LULC map. Their results were encouraging for medium-high- and high-resolution images (Sentinel 2 (S2) and Planetscope), while for L8 data, using only the 30-m bands, they obtained less satisfactory accuracy results. Labib and Harris [24] evaluated the potential of the feature extraction in S2 and Landsat 8 using the GEOBIA methods to compare outputs in the urban areas; as in Ref. [23], the OB classification produced better results on higher resolution S2 data. Differently, Zhang et al. [25] showed that the moderate resolution of the Landsat 8 OLI image, combined with the object-based classification method, can effectively improve land cover classification accuracy in vegetated areas.

Simple Non-Iterative Clustering (SNIC) is widely used in GEE to develop the segmentation step aggregating similar pixel in spatial clusters [23,26]. Mahdianpari et al. [27] produced the Canadian Wetland Inventory using an OB classification of S2 and Sentinel 1 data, based on SNIC and Random Forests (RF), to improve the pixel-based classification. Xiong et al. [28] obtained a 30-m cropland map of continental Africa by integrating pixel-based and object-based algorithms using S2 and L8 data with Support Vector Machine (SVM) and RF. SNIC was also successfully used to reduce the salt-and-pepper effect in mapping LULC of Iran using Sentinel-1 and Sentinel-2 satellite datasets and an OB classification based on RF [29].

The Gray Level Co-occurrence Matrix (GLCM) algorithm, also available in GEE, permits the calculation of image textural indices, which generally improve the LULC classification [30] within GEOBIA approaches [23]. In this regard, Godinho et al. [31] combined multispectral bands with the vegetation indices and GLCM features to improve the LULC classification. Mananze et al. [32] got a Land Cover map in Mozambique from Landsat 7 (L7) and L8 bands, vegetation indices, and textural features extracted by GLCM. In few studies, this algorithm was also applied on panchromatic bands or pan-sharpened images to calculate finer resolution textural metrics useful to improve the final LULC classification. Yazdi et al. [33] used textural information extracted by GLCM from the Advanced Land Imager panchromatic band to improve LULC classification accuracy. Nejad et al. [34] improved the classification results using GLCM extracted from a pan ALI image coupled with Hyperion hyperspectral data. Multispectral, panchromatic, and pan-sharpened medium-resolution images were previously compared as input of the most used segmentation methods (including SNIC) [35]. This comparison showed that the pan-sharped images represent the best choice for the delineation of agricultural parcels. The reliability of using pan-sharpened images to improve LULC classification accuracy in the GEE environment is still today an open research question that is worthy of interest to investigate. The OB methods using the L8 panchromatic band and the related pan-sharpened image, to our best knowledge, have not been performed in GEE yet. The majority of similar research in the literature is generally based on using the non-free eCognition software platform [25,36]. Therefore, there is still a research gap in providing scientific evidence on the use of pan-sharpened images for LULC classification in free cloud software solutions such as GEE.

The innovative non-parametric machine learning (ML) approaches do not require hypotheses on the input data distribution and produce better results than traditional classifiers in the classification of LULC [37]. Among these, Classification and Regression Trees (CART) is a binary decision tree classifier that uses a predefined threshold [38], while RF is an ensemble classifier based on the combination of multiple CART [39]. The latter is one of the best classifiers for the classification of LULC, very efficient and precise [40]. SVM (Support Vector Machine) is a good classifier, but it is more complex because it requires setting essential input parameters, such as the kernel type [41]. RF is undoubtedly the most widely applied ML algorithm in the GEE environment [42] and, in general, for remote sensing data classification thanks to its non-parametric nature, its capability to handle dimensionality and overfitting, its general better performance compared to other classifiers [37,43]. RF is based on multiple CART, and the final prediction is made employing the average obtained from all of them. One of its main intrinsic limitations regards its black box nature concerning the unknown splitting rules [44].

This research aims to classify the 2018–2020 LULC of the Maiella National Park using L8 data. L8, at this stage, was preferred to S2 in the perspective of a future long-term diachronic LULC analysis. Pursuing this objective, in the GEE environment, we compared the typical pixel-based (PB) approaches vs. the more advanced object-based (OB) ones, both based on the RF classification algorithm. In comparing these approaches, we assessed the advantages of integrating the GLCM textural information in the PB approach. Moreover, we tested the possibility of using the L8 panchromatic band to improve the segmentation step and the object’s textural analysis of the OB approach and produce a 15-m resolution LULC map. The innovative contribution we intend to provide with this research relies on the comparison of pixel and object-based approaches using advanced information such as GLCM textural indices and the pan-sharpening process to improve the segmentation accuracy. Moreover, to our best knowledge, our research is the first to assess the accuracy of these different approaches in LULC classification mapping using as many as 10 LULC classes.

2. Materials and Methods

2.1. Study Area

The Maiella National Park, located in the Abruzzo region (central Italy), is one of the 24 Italian National Parks (Figure 1). It is known to be an endemic-rich area [44], and its flora and vegetation diversity are remarkable, considering that more than 2286 plant species have been reported for this territory, 201 of which are endemic to Italy [45] and 15 exclusive endemics of the Maiella massif [46]. The Maiella National Park study area extends for about 741 km² in Abruzzo, Italy (42°04′55″ N–14°03′36″ E), over a mainly mountainous surface and includes the Apennines’ second-highest mountain peak (Mt. Amaro, 2793 m a.s.l.). Its slopes, especially the eastern and north-western ones, are characterized by very long and rugged valleys. Due to the high altitude, many spots are affected by cloud cover for most of the year [47].

The geographical position, high average elevation with more than thirty peaks exceeding 2000 m a.s.l., and the climate’s harshness and changeability make this area unique. It is characterized by one of the most significant biological diversity in Europe, with many areas rich in wild nature that represent the characteristic part of the national biodiversity heritage [48]. It contains five Sites of Community Interest of the European Natura 2000 network, including more than 1800 species, many of which are endemic.

Woodlands represent the natural potential land cover up to 1700–1800 m, sometimes replaced by other formations, due to different anthropic use levels. Below 1000 m, the landscape is a fine-grain patchwork of low-intensity cultivated areas, farmlands, oak forest fragments (Quercus pubescens Willd. subsp. pubescens), and scrublands often resulting from the abandonment of the traditional agricultural activities. At elevations between 1000–1800 m, the slopes are covered by continuous monospecific beech (Fagus sylvatica L. subsp. sylvatica) forests, in mosaic with secondary montane grasslands, partly abandoned. In localized areas, between about 1800–2300 m, the Mugo Pine (Pinus mugo Turra subsp. mugo) dominates with its most extensive formations of the Apennine chain. At higher altitudes, the peaks and high mountain screens are colonized by extremely specialized plant communities; they are covered by snow from October to June and by clouds during most of the year [46].

Considering the spatial and temporal resolution limitations of Landsat data, we selected the LULC classification classes with a focus on natural and semi-natural grasslands and the most important related classes in their typical vegetation dynamics: pioneer eagle fern-dominated stages (“Ferns”), shrubs, and scrublands (“Shrubs”), sparsely vegetated areas, bare soil. Then, we added broadleaf forests, coniferous stands (including natural forests and plantations), hay meadows, other agricultural areas, and built-up areas (Table 1).

2.2. Methodological Framework

Considering the scientific background and the research objectives described above, we structured a methodology to identify the selected LULC classes using L8 data, comparing the classification accuracies of two pixel-based (PB) and two object-based (OB) approaches. As with every typical LULC classification procedure, our methodological workflow includes: (a) data composition and pre-processing, (b) image classification, and (c) accuracy assessment (Figure 2). As introduced, the choice of using L8 data is related to the possibility of performing a future long-term LULC change analysis using previous Landsat data. The first methodological step aims to develop a data composition procedure for improving the quantity and quality of information available from L8 data. In this direction, we integrate cloud-filtering, topography correction, and spectral index calculation, as well as adding ancillary data. The subsequent image classification step contains a first base PB method where individual image pixels are classified by considering the spectral information they hold compared with the spectral signatures of the selected LULC classes derived from training data. Then, in a second PB approach, we integrate the GLCM textural information derived from 30-m resolution data to assess the possibility of improving the accuracy of the first PB approach. We apply a Principal Component Analysis (PCA) to reduce the dimensionality of the multiple textural indices calculated by the GLCM algorithm. The first OB approach is characterized by a segmentation step, developed by applying the SNIC algorithm to group similar pixels into image clusters (i.e., objects). The image’s spectral and textural information used in the classification step is derived based on such clusters. Then, in a second OB approach, we integrate an L8 composite panchromatic band to assess the possibility of improving the segmentation step (using a pan-sharpened image) and the textural object analysis and produce a 15-m LULC map. In each of the compared approaches, considering the above-described advantages of RF, we apply this algorithm in the classification step and perform an initial optimization process to find the optimal number of trees through out-of-bag error analysis. In the last methodological step, we develop a robust recursive accuracy assessment of each tested approach by generating confusion matrices based on the comparison between the classification outputs and ground truth data. These matrices are used to derive a series of descriptive and analytical statistics useful to compare the performance of the different approaches and interpret and synthesize the accuracy level of the output LULC map.

2.3. Data Composition

The generation of the base composite dataset is a crucial step in every LULC classification. We used the L8 data atmospherically corrected at Surface Reflectance (SR) level in this application, available in GEE. The study area is characterized by a persistent cloud cover for most of the year. Thus, according to [49], a sufficiently broad period of interest (in this case, three years, from 2018 to 2020) was defined to create a more effective composite image for the subsequent classification. To reduce the snow cover influence and select the most appropriate time of the year for vegetation growth, we filtered the SR L8 image collection by a proper time interval (from 1 June to 30 November), fixing to 20% the maximum cloud cover, and selecting L8 bands from B2 to B7. Moreover, on the resulting dataset, composed of 32 images, we applied a cloud cover filtering in L8 using the “pixel_qa” band, which allowed us to almost completely mask pixels occupied by dense and cirrus clouds and their on-ground shadows [50].

In mountainous areas, topography has a strong influence on surface reflectance due to the steep slopes. In such places, an effective topographic correction is ordinarily necessary to reduce the topographic effect before the subsequent spectral indices calculation and image classification [51]. To this aim, using the 30 m Shuttle Radar Topography Mission Digital Elevation Model (SRTM DEM), we applied a Sun-Canopy-Sensor + C (SCSc) correction method [52,53] successfully applied in previous studies [54,55], already implemented in a GEE script [56]. To correct some noise and aberrations observed on the first SCSc outputs, we smoothed the DEM convolving with a 2-cells radius low-pass filter.

The addition of spectral indices generally increases the available information and improves the final classification [57,58]. In this regard, for each image in the period of interest, we computed various spectral indices. To better discriminate the built-up and bare soil classes and the differences between agricultural and non-agricultural land [59], we calculated the Bare Soil Index (BSI) and the Normalized Difference Built-up Index (NDBI) (Table 2).

BSI is used in LULC studies to differentiate bare soil and other land cover types [60] and classify and monitor seasonal bare soil areas [61] (Table 2). Mapping built-up areas represent a challenging task due to their similar spectral characteristics with bare soil. NDBI is very helpful for built-up mapping [62,63], demonstrating more effectiveness than other remotely sensed indices [64] (Table 2). To better discriminate the various vegetation cover types, we calculated the Normalized Difference Vegetation Index (NDVI), the Green Atmospherically Resistant Vegetation Index (GARI), and the Chlorophyll vegetation index (CVI) (Table 2).

NDVI is a widely used vegetation index because it directly expresses a measure of vegetation health. The combination of its normalized difference formulation and the use of the highest absorption and reflectance regions of chlorophyll make it robust over a wide range of conditions. It produces significant improvements in classification accuracy [57], and the combinations of spectral measurements at different wavelengths have been used in a plethora of studies to improve classification accuracies [69]. GARI is more sensitive to a wide range of chlorophyll concentrations and less sensitive to atmospheric effects than NDVI. At the same time, CVI has an increased sensitivity to chlorophyll’s content, mainly in the deciduous cover, thanks to the introduction of the green band.

To compose the 30-m resolution Base Data Cube (BDC), we calculated the median images for the L8 bands of the selected 32 images and that of the derived BSI and NDBI. Differently, we computed the median NDVI, GARI, and CVI for three time-sub windows: June July (JJ), August September (AS), and October November (ON). This is to improve classification considering the vegetation variations during the selected time window, mitigating the influence of slight changes due to meteorological conditions. To enhance the classification of agricultural areas, we added a 2018 Corine Land Cover (CLC) layer as ancillary data. We chose this layer considering the availability of previous versions in view of the above-mentioned future long-term LULC change analysis using older Landsat data. CLC is a project initiated in 1985 and consists of an inventory of land cover in 44 classes. CLC uses a Minimum Mapping Unit (MMU) of 25 hectares for areal phenomena and a minimum width of 100 m for linear phenomena [70]. The goal of CLC is standardized data collection on land in Europe to support environmental policy development. Finally, we added the 30 m SRTM DEM and the derived slope and aspect layers to consider the morphological features crucial for an area of interest with a complex morphology as the Maiella Park area [71].

2.4. Classification

The LULC classification is computed using the RF machine learning classifier. RF utilizes bootstrap aggregation (bagging) to build multiple decision trees (DT) and combine the results employing a majority voting method to make a more accurate prediction [40]. Since each DT uses a bagging method, usually, an average of 36.8% of samples do not play any role in each classifier’s training phase and are called Out-Of-Bag (OOB) instances [72]. These are used to measure and estimate the OOB error to evaluate the accuracy of the RF. In general, increasing the DT number allows reaching better performance. However, a threshold can be identified beyond which any increase in the number of trees would bring no significant performance gain and increase the computational cost [73]. An initial calibrations/parameterizations to determine the optimal number of DT for the training phase using the lowest OOB error is common in tuning the optimal parameters [73,74,75]. In this direction, RF classifiers were trained in our application increasing the number of trees from time to time (from 1 to 200). Analyzing the OOB variation (accessible in GEE through the “describe” command), we identified the optimal number of trees for each of the four proposed classification approaches.

Considering the data availability, and given the research objective and the methodological framework, we compared four different approaches in the classification step setting up four different GEE scripts:

Pixel-based (PB);
Pixel-based including the image textural information (PBT);
Object-Based, using BDC (OB);
Object-Based, using the L8 15-m panchromatic band and the BDC (OBP).

In the pixel-based classification approach (PB), the image is quickly classified in GEE using all the spectral indices included in the BDC by the first definition of the RF classifier and the next training phase. A final morphological operation (based on a focal mode) is applied to clean up the final output and reduce the salt-and-pepper effect.

In the PBT approach, we included in the previous classification the textural features extracted using GLCM, a widely used algorithm available in GEE that requires a grey-level 8-bit image as input to generate 18 different textural indices. GLCM computes second-order statistics of texture features from the statistical distribution of observed combinations of intensities at specified positions relative to each other in the image [30]. A weighted linear combination (Equation (1)), typically used for RGB image to grayscale conversion [76], was applied to B3, B4, B5 bands to generate the input for the GLCM step, as follow:

Gray = (0.3 × B5) + (0.59 × B4) + (0.11 × B3),

(1)

We selected the most relevant seven textural indices according to what was suggested by Hall-Beyer et al. [62]. After a standardization step, we applied a Principal Components Analysis (PCA) to synthesize the most relevant textural information in a reduced number of bands. For the final classification, we selected the first 4 PCs accounting for 99.36% of the initial textural information (pc1 = 81.70%, pc2 = 11.40%, pc3 = 5.22%, pc4 = 1.04%). Also, in this case, a focal mode kernel is applied to clean up the final output and reduce the salt-and-pepper effect.

The OB approach typically includes a segmentation step during which similar pixels are grouped into image clusters possibly converted into vectors [21]. In our application, considering the L8 resolution, the objects are represented by the patches, which compose the study area’s landscape. For this purpose, we used the SNIC algorithm, widely used in GEE [27]. This algorithm, an improved version of the Simple Linear Iterative Clustering (SLIC) algorithm, selects the pixels one by one and iteratively assigns them to a superpixel. This process considers both the distance between a given pixel and the superpixels’ center and the difference between the pixels’ color and the superpixels’ mean color [26]. In GEE, the user can set some parameters like “compactness factor”, which influences the cluster shape, “connectivity”, to consider the contiguity based on the merge of adjacent clusters, and “neighborhoodSize” to avoid tile boundary artifacts. In details, always considering the characteristic of the study area, we set “compactness” = 0, “connectivity” = 4, “neighborhoodSize” = 128, “seed spacing” = 7. The latter parameter, in pixels, defines an initial regular grid that influences the cluster size. The textural object analysis is the typical subsequent step in the OB classification, which considers object textures (so the spatial relationships between adjacent pixels), improving the LULC classification performance. We developed the first OB approach (OB) through a spatial cluster identification using SNIC on a 4-bands composite image including B2, B3, B4, B5 of the BDC. The textural information was the same obtained for the PBT approach. Before proceeding to RF classification, the median value of all selected features of BDC and textural PCs is calculated for each spatial cluster. The most represented CORINE class was determined for each cluster as well.

The second OB approach (OBP) was based on integrating the 15-m L8 Pan band. For the same time windows used for the BDC, we calculated a median Pan image. Applying the “rgbToHsv” and “hsvToRgb” methods, we developed a spectral transformation of the L8 30-m RGB bands of the BDC to generate a pan-sharpened composite image at a 15-m resolution [77]. On this layer, we applied the SNIC algorithm to identify spatial clusters (“seed spacing” = 10), while the Pan band was used to calculate GLCM metrics. Also, in this case, we selected the first 4 PCs, which account for 99.67% of the initial textural information (pc1 = 81.07%, pc2 = 12.03%, pc3 = 4.38%, pc4 = 2.18%). Before proceeding to RF classification, the median value of all selected features of BDC and PCs is calculated for each spatial cluster. As performed in the previous OB approach, the most represented CORINE class was determined for each cluster. Using image clusters generated using SNIC at 15-m resolution, we produced an output LULC map at the same spatial resolution.

It is possible to plot a histogram chart in GEE to visualize each feature’s importance in the classification using the variable importance included in the output from the “explain” command. These outputs can be normalized to obtain the importance of each feature expressed in the 0–100% range. This graph is handy to empirically analyze the more interesting and crucial information utilized for the LULC classification. We used this chart, combined with accuracy outputs, to select the most compelling RF classification features.

2.5. Accuracy Assessment

We collected 1200 randomly distributed points with a reciprocal minimum distance of 500 m to train the classifier and validate the LULC classification results. These points were classified in QGIS, visually integrating information from Bing maps, Google maps, L8 RGB, L8 pan-sharpened images, L8 NDVI profiles, “Carta della Natura” map [78]. We randomly and recursively redefined this subdivision to evaluate the tested approaches’ performance and accuracy more robustly. In each iteration, according to a random selection, 70% of these points are used for the training phase and 30% for the validation phase. This 70/30 ratio is very common in RF applications see, e.g., [79,80,81]. We recursively redefined this subdivision, over 50 iterations, to evaluate the tested approaches’ performance more robustly. The accuracy of the four approaches was assessed using the confusion matrix (CM) method, a widely used approach based on the comparison of the classification outputs with the ground truth data [82,83]. Some specific accuracy metrics were derived from the matrix: Overall Accuracy (OA, Equation (2)), Producer’s Accuracy (PA, Equation (3)) (also known as “recall”), and User’s Accuracy (UA, Equation (4)) (also known as “precision”), according to the following formulas:

O A = \frac{N u m b e r o f c o r r e c t p l o t s}{t o t a l n u m b e r o f p l o t s}

(2)

P A = \frac{N u m b e r c o r r e c t l y i d e n t i f i e d i n r e f e r e n c e p l o t s o f a g i v e n m a p s}{N u m b e r a c t u a l l y i n t h a t r e f e r e n c e c l a s s}

(3)

U A = \frac{N u m b e r c o r r e c t l y i d e n t i f i e d i n a g i v e n m a p c l a s s}{N u m b e r c l a i m e d t o b e i n t h a t m a p c l a s s}

(4)

F - s c o r e = 2 \times \frac{P A \times U A}{P A + U A}

(5)

The OA is defined as the degree of closeness of the results to values accepted as true [84]. PA or “recall” is also described as the relationship between true positives and the total number of true positives and false positives. At the same time, the UA or “precision” is also defined as the relation of true positives to the total number of true positives and false negatives. The UA is the complement of the commission error, while the PA complements the omission error. F-score is a composite measure that measures algorithm reliability with higher sensitivity and specificity [85]. It is calculated by the harmonic mean of PA and UA and attempts to describe the balance of the two [86,87] (Equation (5)). A classifier is considered highly accurate when obtaining a combination of high recall and precision values. In particular, regarding the LULC classification, high precision (UA) corresponds to a low representation of commission errors and high recall (PA) to a low value of omission errors [88].

3. Results

The initial topographic correction produced an adequate illumination re-balancing of the base L8 images (Figure 3).

OOB error used to identify the optimal number of trees for the RF classifiers in the four proposed approaches shows a typical decreasing trend when the number of trees increases (Figure 4). Stable results for all methods are obtained after about DT = 100. This threshold was chosen as the optimal compromise in terms of accuracy and computational costs. In detail, the OOB error obtained for the PB, PBT, and OOG at DT = 100 results in a similar range of values (PB = 0.198, PBT = 0.199, OOG = 0.185) while the OBP approach showed the best performance producing the lowest OOB error (OBP = 0.151). These values reflect the average accuracy indices calculated subsequently for the same four approaches.

The output maps from the four tested approaches show the spatial distribution of the selected LULC classes (Figure 5). Figure 6 shows some of the slight differences between PB and PBT approaches in a region of the study area. The same figure shows the differences in SNIC outputs and the OB and OBP approaches’ final LULC classification. Finally, the same figure represents the SNIC output and the related LULC classification from the OBP method when all the process is developed at 15-m resolution.

The 50 iterative RF experiments computed for each approach generated information on the accuracy indices’ variability (OA, PA, UA, and F-score) (Figure 7). The graph shows clearly how both the addition of textural bands with the PBT approach and the adoption of an object-based approach on the 30 m resolution dataset did not improve the accuracy results obtained with the pixel-based classification approach. Using the higher resolution panchromatic band, the OBP method achieves significantly better results than the other approaches for all four accuracy indices.

A deeper insight into the reliability of the results is provided by comparing the average PA and UA for the LULC classes (Figure 8). From the producer’s perspective (omission errors), the OB approach generally performs better or equal to the other approaches in classifying all the classes except AGR, which is slightly better classified by the PB approaches. In general, two classes result in the most problematic: CNF mainly due to the confusion with SHR and BLF, SHR due to the confusion with GRS and BLF. The OBP approach improves remarkably the PA of the latter and that of the FER. From the user’s perspective (commission errors), the OBP approach significantly improves the accuracy of UA of BSR, GRS, BLF, and SHR classes.

The UA results for the various LULC classes are satisfactory for all the approaches (>70%), except for SHR results as a problematic class with all the tested approaches due to the commission errors with various classes such as GRS, BLF, CNF, and AGR.

The RF classification input features’ normalized importance explains the various bands’ relevance in the four tested approaches (Figure 9). This output supports understanding the relative weight of the various bands in the four classifications and selecting the most relevant RF classification features to speed up the classification step. The feature importance for all the four approaches shows an excellent balancing of all the features in the final classification, confirming their relevance for class discrimination. Only the aspect layer resulted slightly less important than the other morphological information in all the approaches except PB. The 2018 CORINE LULC layer was relevant for all the approaches but less for the OB and the OBP methods. All the four textural PCs are characterized by almost equal importance in all three approaches which integrated them.

4. Discussion

GEE has achieved substantial success, thanks to its free cloud-based platform that overcame the remote sensing application’s intrinsic limitation of data storage, retrieval, and pre-processing. The exponential growth of machine learning in LULC classification, applied on pixel-based and object-based approaches, aims to get exceptional performance and innovative methods increasingly affordable. In this research, GEE represents the programming environment that allowed complex image processing and analysis using the javascript language. GEE resulted very effectively in the BDC composition achieved by selecting the best time window of the year, applying a cloud-filtering and a topography-correction algorithm, calculating various spectral indices, and integrating morphological and ancillary data.

Differently from previous research (e.g., [27,28,29,43]), we successfully compared and assessed pixel and object-based approaches using advanced information such as GLCM textural indices and the pan-sharpening process to improve the segmentation accuracy. We used medium-resolution data (i.e., Landsat 8) for future developments in the diachronic studies and LULC change analysis over multi-decadal periods. To our best knowledge, our research is the first to assess the accuracy of these different approaches in LULC classification mapping using as many as 10 LULC classes. All approaches are based on the RF machine learning algorithm that, recently, has become the most widely applied algorithm in remote sensing data classification thanks to its non-parametric nature and its capability to handle the overfitting build utilizing multiple DTs. The definition of the DT number is related to an optimal threshold beyond which any increase would bring no significant performance gain and would merely raise the computational cost [74]. In this regard, we successfully performed an initial calibration and tuning of this method’s parameter utilizing the lowest OOB error and making use of all the randomly distributed points within the study area. With this approach, we limited the number of DT, favoring a faster RF classification and better utilization of GEE’s computational resources.

This work assessed the possible advantages of including the textural information derived from L8 30-m resolution data to the pixel-based approaches. In this direction, we compared the PB approach using all the median bands of the dataset and the PBT based on the textural information’s addition. Even though the textural information was quite relevant for the final PBT classification (Figure 9), this information did not improve the overall RF performance. This is probably due to the impossibility of reading discriminant textures of landscape patches composing the study area using the native resolution of L8 multispectral bands. This evidence could be due to difficulties in using textural information without implementing an object-based approach where this information is synthesized at the object/patch level. However, even with an object-based approach, the textural information derived from 30-m resolution L8 data slightly worsened the PB accuracy results (Figure 7). This evidence is coherent with a previous study where the object-based approach in GEE produced encouraging results for S2 (10-m of geometrical resolution) and Planetscope (3-m resolution) while, for L8 data, less satisfactory accuracy results were obtained [23].

The addition of morphometric features (elevation, slope, aspect) and the 2018 Corine LULC layer as ancillary data was important for all the classification approaches. The former is quite common in LULC classification [89] and, also in this application, demonstrated very useful for improving classification accuracy. Differently, the integration of pre-existent LULC layers, to our best knowledge, is less common. The cloudy nature of the study area required a preliminary data composition based on median values of long-time windows to create a free-of-cloud base image. This choice prevented using a more detailed multi-temporal dataset to distinguish the meadows and the other agricultural areas from other classes, such as grasslands and sparsely vegetated areas. Even the use of vegetation indices statistics (i.e., standard deviation, minimum, maximum) in a preliminary stage of the research did not work for this purpose. Differently, the use of the CORINE layer was helpful for the discrimination of the above-indicated LULC classes.

As already performed in previous studies [57,58], the proposed methodology improved the derived spectral information calculating specific spectral indices useful in classifying specific LULC classes (BSI and NDBI). These indices allowed discriminating better the built-up and bare soil classes and the differences between agricultural and non-agricultural land. Moreover, we calculated three VIs (NDVI, GARI, and CVI) to improve the classification. In this case, we derived these VIs according to three time-sub windows (June-July, August-September, and October-November), considering the vegetation variations during these selected time windows and mitigating the influence of slight changes in meteorological conditions.

Considering these primary limitations of object-based approaches using only the 30-m bands, we assessed the possibility and advantages of integrating the L8 panchromatic band in the segmentation step and the textural object analysis to improve the L8 data classification output. In previous studies, the integration of this band and the derived pan-sharpened image within the object-based methods determined an improvement of the accuracy, but an evident limitation is linked to the use of the proprietary eCognition software platform, a non-free software. In our research, SNIC segmentation applied on an L8 pan-sharpened image within the OBP method resulted in more effective and detailed (considering the higher spatial resolution) in the clusters delineation and improved results obtained in the OB approach (Figure 5). The subsequent GLCM applied on the composite panchromatic band improved the textural information associated with the different patches. As already performed in [23], in both OB and OBP approaches, we successfully applied a PCA that helped the user select a few relevant textural bands to be integrated into the LULC classification. The addition of PC3 and PC4, even though containing a small quantity of textural information, was significant for the final classification (Figure 9).

This study implemented a consistent dataset of 1200 points used as training/validation ground truths that allowed obtaining reliable accuracy results. Moreover, we compared the four approaches analyzing the mean confusion matrices obtained from a randomly and recursively subdivision of the ground-truth dataset (50 iterations). This step, rarely performed in similar RS applications, allowed us to more robustly evaluate the performance of the tested approaches using the statistical distribution of overall accuracy (OA), the producer’s accuracy (PA), the user’s accuracy (UA), and the F-score. These metrics confirmed all four approaches’ reliability, showing their general applicability in cloudy and morphologically challenging areas. Accuracy results for some classes reflect the limitations of L8 data in discriminating specific classes (especially SHR and CNF). However, the other classes’ accuracy is satisfactory for all the tested approaches, even though the OBP approach improved the classification of some specific, critical LULC classes (BSR, GRS, BLF, and SHR) considerably.

Using a 15-m pan-sharpened composite RGB image in the OBP method in substituting the linear combination of 30-m bands of the OB approach improved the spatial clustering of pixels potentially belonging to the same landscape patch and the calculation of GLCM metrics. This research shows for the first time how the integration of the panchromatic band in an OB approach is possible in GEE and can produce a sensible improvement in all considered accuracy parameters and the final map quality. The output LULC maps from the OBP approach, other than being more accurate, are also very detailed and effective in representing the landscape’s patches (Figure 5). This result is very promising and interesting since, even in a free platform as GEE, it was possible to implement advanced classification processes to fully exploit the informative content of L8 data.

5. Conclusions

The work aimed to compare and assess pixel-based and object-based approaches based on RF to classify the LULC of the Maiella National Park using Landsat 8 data. We assessed the advantages of integrating textural information in the pixel-based approach and tested the possibility of using the L8 panchromatic band to improve the segmentation step and the object’s textural analysis of the object-based approach and produce a 15-m resolution LULC map. A random and recursive subdivision of the ground-truth dataset in training and validation subsets supported a more robust evaluation of the performance of the tested approaches using the overall accuracy (OA), the producer’s accuracy (PA), the user’s accuracy (UA), and the F-score. These indices confirmed the good reliability and applicability of all the four approaches in cloudy and morphologically complex areas such as the Maiella National Park.

The inclusion of the textural information in the traditional pixel-based approach and applying the OB method on 30-m resolution L8 data did not improve the traditional PB method’s accuracy results. Differently, the integration of the 15-m resolution L8 panchromatic band, both in the object delineation (using a pan-sharpened image) and in the image textural analysis in the OBP method, significantly improved the accuracy of the LULC classification. Our research findings demonstrated the possibility of producing, in GEE, 15-m resolution LULC maps starting from L8 data and using the OOP method. Further improvements on the LULC classifications may probably be obtained with more advanced pan-sharpened methods allowing more bands to be included (i.e., the NIR band).

Future improvements of GEE available computational resources and segmentation algorithms may allow users to produce better LULC classification outputs from L8 data even at 15-m resolution. The proposed straightforward and cost-effective methodological approach may be successfully implemented in other critical regions of interest where the logistic and economic costs associated with land-use and land-cover monitoring are generally considerable for land managers.

Author Contributions

Conceptualization: M.V. and A.T.; methodology, M.V., A.T., D.G., G.M. and L.D.M.; software, data curation, and validation: A.T., M.V. and D.G.; original draft preparation: M.V. and A.T.; writing—review and editing, M.V., A.T., D.G., G.M. and L.D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was developed within the research project “Studio del valore pabulare dei pascoli e linee guida per la gestione e la conservazione degli habitat 6170-6210-6230-6510 dell’Allegato I della Direttiva 43/92 CEE” developed by the Department of Agricultural, Food, and Environmental Sciences, University of Perugia, funded by the Maiella National Park.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data used in the research are openly available in the USGS archives (https://earthexplorer.usgs.gov/, accessed on 5 July 2021) and within Google Earth Engine. The GEE codes developed in this research are available, upon any reasonable request, by emailing the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Rawat, J.S.; Biswas, V.; Kumar, M. Changes in land use/cover using geospatial techniques: A case study of Ramnagar town area, district Nainital, Uttarakhand, India. Egypt. J. Remote Sens. Sp. Sci. 2013. [Google Scholar] [CrossRef] [Green Version]
Antognelli, S.; Vizzari, M. Ecosystem and urban services for landscape liveability: A model for quantification of stakeholders’ perceived importance. Land Use Policy 2016, 50. [Google Scholar] [CrossRef]
Antognelli, S.; Vizzari, M. Landscape liveability spatial assessment integrating ecosystem and urban services with their perceived importance by stakeholders. Ecol. Indic. 2017, 72, 703–725. [Google Scholar] [CrossRef]
Padoch, C.; Brondizio, E.; Costa, S.; Pinedo-Vasquez, M.; Sears, R.R.; Siqueria, A. Urban forest and rural cities: Multi-sited households, consumption patterns, and forest resources in Amazonia. Ecol. Soc. 2008, 13. [Google Scholar] [CrossRef] [Green Version]
Tassi, A.; Gil, A. A Low-cost Sentinel-2 Data and Rao’s Q Diversity Index-based Application for Detecting, Assessing and Monitoring Coastal Land-cover/Land-use Changes at High Spatial Resolution. J. Coast. Res. 2020. [Google Scholar] [CrossRef]
Twisa, S.; Buchroithner, M.F. Land-use and land-cover (LULC) change detection in Wami river basin, Tanzania. Land 2019, 8, 136. [Google Scholar] [CrossRef] [Green Version]
Fichera, C.R.; Modica, G.; Pollino, M. Land Cover classification and change-detection analysis using multi-temporal remote sensed imagery and landscape metrics. Eur. J. Remote Sens. 2012, 45, 1–18. [Google Scholar] [CrossRef]
Modica, G.; Vizzari, M.; Pollino, M.; Fichera, C.R.R.; Zoccali, P.; Di Fazio, S. Spatio-temporal analysis of the urban–rural gradient structure: An application in a Mediterranean mountainous landscape (Serra San Bruno, Italy). Earth Syst. Dyn. 2012, 3, 263–279. [Google Scholar] [CrossRef] [Green Version]
Vizzari, M.; Sigura, M. Landscape sequences along the urban–rural–natural gradient: A novel geospatial approach for identification and analysis. Landsc. Urban Plan. 2015, 140, 42–55. [Google Scholar] [CrossRef]
Vizzari, M.; Hilal, M.; Sigura, M.; Antognelli, S.; Joly, D. Urban-rural-natural gradient analysis with CORINE data: An application to the metropolitan France. Landsc. Urban Plan. 2018. [Google Scholar] [CrossRef]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Remote Sensing of Environment Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017. [Google Scholar] [CrossRef]
Griffiths, P.; van der Linden, S.; Kuemmerle, T.; Hostert, P. A Pixel-Based Landsat Compositing Algorithm for Large Area Land Cover Mapping. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013. [Google Scholar] [CrossRef]
Irons, J.R.; Dwyer, J.L.; Barsi, J.A. The next Landsat satellite: The Landsat Data Continuity Mission. Remote Sens. Environ. 2012, 122, 11–21. [Google Scholar] [CrossRef] [Green Version]
Available online: https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C01_T1_SR#description (accessed on 1 June 2021).
Available online: https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C01_T1_TOA (accessed on 1 June 2021).
Available online: http://gsp.humboldt.edu/OLM/Courses/GSP_216_Online/lesson4-1/pan-sharpen.html (accessed on 1 June 2021).
Modica, G.; Solano, F.; Merlino, A.; Di Fazio, S.; Barreca, F.; Laudari, L.; Fichera, C.R. Using landsat 8 imagery in detecting cork oak (Quercus suber L.) Woodlands: A case study in Calabria (Italy). J. Agric. Eng. 2016, 47, 205–215. [Google Scholar] [CrossRef] [Green Version]
Pfeifer, M.; Disney, M.; Quaife, T.; Marchant, R. Terrestrial ecosystems from space: A review of earth observation products for macroecology applications. Glob. Ecol. Biogeogr. 2011, 21, 603–624. [Google Scholar] [CrossRef] [Green Version]
Blaschke, T. Object based image analysis for remote sensing. ISPRS J Photogramm Remote Sens. ISPRS J. Photogramm. Remote Sens. 2010, 65, 2–16. [Google Scholar] [CrossRef] [Green Version]
Solano, F.; Di Fazio, S.; Modica, G. A methodology based on GEOBIA and WorldView-3 imagery to derive vegetation indices at tree crown detail in olive orchards. Int. J. Appl. Earth Obs. Geoinf. 2019, 83, 101912. [Google Scholar] [CrossRef]
Ren, X.; Malik, J. Learning a classification model for segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003. [Google Scholar]
Flanders, D.; Hall-Beyer, M.; Pereverzoff, J. Preliminary evaluation of eCognition object-based software for cut block delineation and feature extraction. Can. J. Remote Sens. 2003. [Google Scholar] [CrossRef]
Tassi, A.; Vizzari, M. Object-Oriented LULC Classification in Google Earth Engine Combining SNIC, GLCM, and Machine Learning Algorithms. Remote Sens. 2020, 12, 3776. [Google Scholar] [CrossRef]
Labib, S.M.; Harris, A. The potentials of Sentinel-2 and LandSat-8 data in green infrastructure extraction, using object based image analysis (OBIA) method. Eur. J. Remote Sens. 2018. [Google Scholar] [CrossRef]
Zhang, C.; Li, X.; Wu, M.; Qin, W.; Zhang, J. Object-oriented Classification of Land Cover Based on Landsat 8 OLI Image Data in the Kunyu Mountain. Sci. Geogr. Sin. 2018, 38, 1904–1913. [Google Scholar]
Achanta, R.; Süsstrunk, S. Superpixels and polygons using simple non-iterative clustering. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Mahdianpari, M.; Salehi, B.; Mohammadimanesh, F.; Homayouni, S.; Gill, E. The first wetland inventory map of newfoundland at a spatial resolution of 10 m using sentinel-1 and sentinel-2 data on the Google Earth Engine cloud computing platform. Remote Sens. 2019, 11, 43. [Google Scholar] [CrossRef] [Green Version]
Xiong, J.; Thenkabail, P.S.; Tilton, J.C.; Gumma, M.K.; Teluguntla, P.; Oliphant, A.; Congalton, R.G.; Yadav, K.; Gorelick, N. Nominal 30-m cropland extent map of continental Africa by integrating pixel-based and object-based algorithms using Sentinel-2 and Landsat-8 data on google earth engine. Remote Sens. 2017, 9, 1065. [Google Scholar] [CrossRef] [Green Version]
Ghorbanian, A.; Kakooei, M.; Amani, M.; Mahdavi, S.; Mohammadzadeh, A.; Hasanlou, M. Improved land cover map of Iran using Sentinel imagery within Google Earth Engine and a novel automatic workflow for land cover classification using migrated training samples. ISPRS J. Photogramm. Remote Sens. 2020, 167, 276–288. [Google Scholar] [CrossRef]
Mohanaiah, P.; Sathyanarayana, P.; Gurukumar, L. Image Texture Feature Extraction Using GLCM Approach. Int. J. Sci. Res. Publ. 2013, 3, 1–5. [Google Scholar]
Godinho, S.; Guiomar, N.; Gil, A. Estimating tree canopy cover percentage in a mediterranean silvopastoral systems using Sentinel-2A imagery and the stochastic gradient boosting algorithm. Int. J. Remote Sens. 2018. [Google Scholar] [CrossRef]
Mananze, S.; Pôças, I.; Cunha, M. Mapping and assessing the dynamics of shifting agricultural landscapes using google earth engine cloud computing, a case study in Mozambique. Remote Sens. 2020, 12, 1279. [Google Scholar] [CrossRef] [Green Version]
Maleknezhad Yazdi, A.; Eisavi, V.; Shahsavari, A. İran Tahran şehri güney bölgesinde kent alanlarının sınıflandırılmasında SVM yöntemi ile hiperspektral görüntü ve tekstür bilgilerinin birlikte kullanılması. İstanbul Üniversitesi Orman Fakültesi Derg. 2016. [Google Scholar] [CrossRef]
Nejad, A.M.; Ghassemian, H.; Mirzapour, F. GLCM Texture Features Efficiency Assessment of Pansharpened Hyperspectral Image Classification for Residential and Industrial Regions in Southern Tehran. J. Geomat. Sci. Technol. 2015, 5, 55–64. [Google Scholar]
Mengge, T. Assessing the Value of Superpixel Approaches to Delineate Agricultural Parcels. Master’s Thesis, University of Twente, Enschede, Holland, 2019. [Google Scholar]
Wu, C.F.; Deng, J.S.; Wang, K.; Ma, L.G.; Tahmassebi, A.R.S. Object-based classification approach for greenhouse mapping using Landsat-8 imagery. Int. J. Agric. Biol. Eng. 2016. [Google Scholar] [CrossRef]
Nery, T.; Sadler, R.; Solis-Aulestia, M.; White, B.; Polyakov, M.; Chalak, M. Comparing supervised algorithms in Land Use and Land Cover classification of a Landsat time-series. In Proceedings of the International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 3 November 2016. [Google Scholar]
Mather, P.; Tso, B. Classification Methods for Remotely Sensed Data, 2nd ed.; CRC Press, Engineering and Technology, Environment and Agriculture: Boca Raton, FL, USA, 2016; ISBN 9781420090741. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001. [Google Scholar] [CrossRef] [Green Version]
Gislason, P.O.; Benediktsson, J.A.; Sveinsson, J.R. Random forests for land cover classification. In Pattern Recognition Letters; Elsevier Science: New York, NY, USA, 2006. [Google Scholar] [CrossRef]
Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
Amani, M.; Ghorbanian, A.; Ahmadi, S.A.; Kakooei, M.; Moghimi, A.; Mirmazloumi, S.M.; Moghaddam, S.H.A.; Mahdavi, S.; Ghahremanloo, M.; Parsian, S.; et al. Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications: A Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5326–5350. [Google Scholar] [CrossRef]
Praticò, S.; Solano, F.; Di Fazio, S.; Modica, G. Machine Learning Classification of Mediterranean Forest Habitats in Google Earth Engine Based on Seasonal Sentinel-2 Time-Series and Input Image Composition Optimisation. Remote Sens. 2021, 13, 586. [Google Scholar] [CrossRef]
Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 2012. [Google Scholar] [CrossRef]
Conti, F.; Ciaschetti, G.; Di Martino, L.; Bartolucci, F. An annotated checklist of the vascular flora of Majella national park (Central Italy). Phytotaxa 2019, 412, 1–90. [Google Scholar] [CrossRef]
Di Cecco, V.; Di Santo, M.; Di Musciano, M.; Manzi, A.; Di Cecco, M.; Ciaschetti, G.; Marcantonio, G.; Di Martino, L. The Majella national park: A case study for the conservation of plant biodiversity in the Italian apennines. Ital. Bot. 2020, 10, 1–24. [Google Scholar] [CrossRef]
Liberatoscioli, E.; Boscaino, G.; Agostini, S.; Garzarella, A.; Scandone, E. The Majella National Park: An aspiring UNESCO geopark. Geosciences 2018, 8, 256. [Google Scholar] [CrossRef] [Green Version]
Available online: http://www.parks.it/parco.nazionale.majella/Epar.php (accessed on 1 June 2021).
Xie, S.; Liu, L.; Zhang, X.; Yang, J.; Chen, X.; Gao, Y. Automatic land-cover mapping using landsat time-series data based on google earth engine. Remote Sens. 2019, 11, 3023. [Google Scholar] [CrossRef] [Green Version]
Nyland, K.E.; Gunn, G.E.; Shiklomanov, N.I.; Engstrom, R.N.; Streletskiy, D.A. Land cover change in the lower Yenisei River using dense stacking of landsat imagery in Google Earth Engine. Remote Sens. 2018, 10, 1226. [Google Scholar] [CrossRef] [Green Version]
Richter, R.; Kellenberger, T.; Kaufmann, H. Comparison of topographic correction methods. Remote Sens. 2009, 10, 184. [Google Scholar] [CrossRef] [Green Version]
Soenen, S.A.; Peddle, D.R.; Coburn, C.A. SCS+C: A modified sun-canopy-sensor topographic correction in forested terrain. IEEE Trans. Geosci. Remote Sens. 2005. [Google Scholar] [CrossRef]
Vanonckelen, S.; Lhermitte, S.; Balthazar, V.; Van Rompaey, A. Performance of atmospheric and topographic correction methods on Landsat imagery in mountain areas. Int. J. Remote Sens. 2014. [Google Scholar] [CrossRef] [Green Version]
Belcore, E.; Piras, M.; Wozniak, E. Specific alpine environment land cover classification methodology: Google earth engine processing for sentinel-2 data. In Proceedings of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences-ISPRS Archives, Nice, France, 31 August–2 September 2020. [Google Scholar]
Shepherd, J.D.; Dymond, J.R. Correcting satellite imagery for the variance of reflectance and illumination with topography. Int. J. Remote Sens. 2003. [Google Scholar] [CrossRef]
Burns, P.; Macander, M. Topographic Correction in GEE–Open Geo Blog. Available online: https://mygeoblog.com/2018/10/17/terrain-correction-in-gee/ (accessed on 25 February 2021).
Singh, R.P.; Singh, N.; Singh, S.; Mukherjee, S. Normalized Difference Vegetation Index (NDVI) Based Classification to Assess the Change in Land Use/Land Cover (LULC) in Lower Assam, India. Int. J. Adv. Remote Sens. GIS 2016. [Google Scholar] [CrossRef]
Capolupo, A.; Monterisi, C.; Tarantino, E. Landsat Images Classification Algorithm (LICA) to Automatically Extract Land Cover Information in Google Earth Engine Environment. Remote Sens. 2020, 12, 1201. [Google Scholar] [CrossRef] [Green Version]
Polykretis, C.; Grillakis, M.G.; Alexakis, D.D. Exploring the impact of various spectral indices on land cover change detection using change vector analysis: A case study of Crete Island, Greece. Remote Sens. 2020, 2, 319. [Google Scholar] [CrossRef] [Green Version]
Diek, S.; Fornallaz, F.; Schaepman, M.E.; de Jong, R. Barest Pixel Composite for agricultural areas using landsat time series. Remote Sens. 2017, 9, 1245. [Google Scholar] [CrossRef] [Green Version]
Chen, W.; Liu, L.; Zhang, C.; Wang, J.; Wang, J.; Pan, Y. Monitoring the seasonal bare soil areas in Beijing using multi-temporal TM images. In Proceedings of the International Geoscience and Remote Sensing Symposium (IGARSS), Anchorage, AK, USA, 20–24 September 2004. [Google Scholar]
Zha, Y.; Gao, J.; Ni, S. Use of normalized difference built-up index in automatically mapping urban areas from TM imagery. Int. J. Remote Sens. 2003. [Google Scholar] [CrossRef]
Bhatti, S.S.; Tripathi, N.K. Built-up area extraction using Landsat 8 OLI imagery. GIScience Remote Sens. 2014, 51, 445–467. [Google Scholar] [CrossRef] [Green Version]
Ghosh, D.K.; Mandal, A.C.H.; Majumder, R.; Patra, P.; Bhunia, G.S. Analysis for mapping of built-up area using remotely sensed indices-A case study of rajarhat block in Barasat sadar sub-division in west Bengal (India). J. Landsc. Ecol. Repub. 2018. [Google Scholar] [CrossRef] [Green Version]
Rikimaru, A.; Roy, P.S.; Miyatake, S. Tropical forest cover density mapping. Trop. Ecol. 2002, 43, 39–47. [Google Scholar]
Rouse, J.W., Jr.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring vegetation systems in the great plains with erts. In Proceedings of the NASA SP-351, 3rd ERTS-1 Symposium, Greenbelt, Maryland, 29 July 1974; pp. 309–317. [Google Scholar]
Gitelson, A.A.; Kaufman, Y.J.; Merzlyak, M.N. Use of a green channel in remote sensing of global vegetation from EOS- MODIS. Remote Sens. Environ. 1996, 58, 289–298. [Google Scholar] [CrossRef]
Vincini, M.; Frazzi, E.; D’Alessio, P. A broad-band leaf chlorophyll vegetation index at the canopy scale. Precis. Agric. 2008, 9, 303–319. [Google Scholar] [CrossRef]
Kobayashi, N.; Tani, H.; Wang, X.; Sonobe, R. Crop classification using spectral indices derived from Sentinel-2A imagery. J. Inf. Telecommun. 2020. [Google Scholar] [CrossRef]
Available online: https://land.copernicus.eu/pan-european/corine-land-cover (accessed on 1 June 2021).
Grohmann, C.H. Effects of spatial resolution on slope and aspect derivation for regional-scale analysis. Comput. Geosci. 2015. [Google Scholar] [CrossRef] [Green Version]
Meher, P.K.; Sahu, T.K.; Rao, A.R. Prediction of donor splice sites using random forest with a new sequence encoding approach. BioData Min. 2016. [Google Scholar] [CrossRef]
Oshiro, T.M.; Perez, P.S.; Baranauskas, J.A. How many trees in a random forest? In Proceedings of the Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Berlin, Germany, 13–20 July 2012. [Google Scholar]
Ozigis, M.S.; Kaduk, J.D.; Jarvis, C.H.; da Conceição Bispo, P.; Balzter, H. Detection of oil pollution impacts on vegetation using multifrequency SAR, multispectral images with fuzzy forest and random forest methods. Environ. Pollut. 2020. [Google Scholar] [CrossRef]
Probst, P.; Boulesteix, A.L. To tune or not to tune the number of trees in random forest. J. Mach. Learn. Res. 2018, 18, 1–18. [Google Scholar]
Ge, J.; Liu, H. Investigation of image classification using hog, glcm features, and svm classifier. In Proceedings of the Lecture Notes in Electrical Engineering, Singapore, 15 October 2020; Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2020; Volume 645, pp. 411–417. [Google Scholar]
Available online: https://developers.google.com/earth-engine/guides/image_transforms (accessed on 1 June 2021).
Carta Della Natura—Italiano. Available online: https://www.isprambiente.gov.it/it/servizi/sistema-carta-della-natura (accessed on 25 March 2021).
Mueller, J.P.; Massaron, L. Training, Validating, and Testing in Machine Learning. Available online: https://www.dummies.com/programming/big-data/data-science/training-validating-testing-machine-learning/ (accessed on 7 June 2021).
Adelabu, S.; Mutanga, O.; Adam, E. Testing the reliability and stability of the internal accuracy assessment of random forest for classifying tree defoliation levels using different validation methods. Geocarto Int. 2015, 30, 810–821. [Google Scholar] [CrossRef]
Chen, W.; Xie, X.; Wang, J.; Pradhan, B.; Hong, H.; Bui, D.T.; Duan, Z.; Ma, J. A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. Catena 2017, 151, 147–160. [Google Scholar] [CrossRef] [Green Version]
Congalton, R.G. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens. Environ. 1991. [Google Scholar] [CrossRef]
Liu, C.; Frazier, P.; Kumar, L. Comparative assessment of the measures of thematic classification accuracy. Remote Sens. Environ. 2007. [Google Scholar] [CrossRef]
Al-Saady, Y.; Merkel, B.; Al-Tawash, B.; Al-Suhail, Q. Land use and land cover (LULC) mapping and change detection in the Little Zab River Basin (LZRB), Kurdistan Region, NE Iraq and NW Iran. FOG Freib. Online Geosci. 2015, 43, 1–32. [Google Scholar]
Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation. In Proceedings of the AAAI Workshop—Technical Report; Springer: Berlin, Heidelberg, December 2006. [Google Scholar]
Taubenböck, H.; Esch, T.; Felbier, A.; Roth, A.; Dech, S. Pattern-based accuracy assessment of an urban footprint classification using TerraSAR-X data. IEEE Geosci. Remote Sens. Lett. 2011. [Google Scholar] [CrossRef]
Stehman, S.V. Statistical rigor and practical utility in thematic map accuracy assessment. Photogramm. Eng. Remote Sens. 2001, 67, 727–734. [Google Scholar]
Weaver, J.; Moore, B.; Reith, A.; McKee, J.; Lunga, D. A comparison of machine learning techniques to extract human settlements from high resolution imagery. In Proceedings of the International Geoscience and Remote Sensing Symposium (IGARSS), Valencia, Spain, 22–27 July 2018. [Google Scholar]
Hurskainen, P.; Adhikari, H.; Siljander, M.; Pellikka, P.K.E.; Hemp, A. Auxiliary datasets improve accuracy of object-based land use/land cover classification in heterogeneous savanna landscapes. Remote Sens. Environ. 2019. [Google Scholar] [CrossRef]

Figure 1. Location of the Maiella National Park, Abruzzo, Italy (a). L8 median RGB (b) and composite infra-red (c) images. Elevation in meters a.s.l. (d) and slope in percentage (e) derived from Shuttle Radar Topography Mission Digital Elevation Model (SRTM DEM).

Figure 2. Flowchart of the methodology (Pixel-based, PB; Pixel-based including the image textural information, PBT; Object-Based, OB; Object-Based using the L8 15-m panchromatic band, OBP).

Figure 3. L8 median RGB and composite infra-red images before (a,c) and after (b,d) the topographic correction performed in Google Earth Engine.

Figure 4. Out-Of-Bag (OOB) error vs. the number of trees for the four tested approaches (Pixel-based, PB; Pixel-based including the image textural information, PBT; Object-Based, OB; Object-Based using the L8 15-m panchromatic band, OBP).

Figure 5. Land Use Land Cover (LULC) classification maps at 30 m of geometrical resolution of the study area using the different approaches (a) Pixel-based, PB; (b) Pixel-based including the image textural information, PBT; (c) Object-Based, OB; (d) Object-Based using the L8 15-m panchromatic band, OBP) and the region’s location (red polygon) used for Figure 5. Codes used for LULC classes are described in Table 1.

Figure 6. Land Use Land Cover (LULC) classification outputs from Pixel-based (PB) (a) and Pixel-based, including the image textural information (PBT) (b). SNIC output (c) and LULC classification (d) from the Object-Based (OB) method. SNIC output (e) and LULC classification (f) from the Object-Based method using the L8 15-m panchromatic band (OBP). Codes used for LULC classes are described in Table 1.

Figure 7. Boxplots showing the median values and the variability of OA (Overall Accuracy), PA (Producer’s Accuracy), UA (User’s Accuracy), and F-score for the four tested approaches after 50 iterations. Each graph shows the median value (Q2), the interquartile range (IQR: Q3–Q1), and the upper and lower limits (1.5 times the IQR). (Pixel-based, PB; Pixel-based including the image textural information, PBT; Object-Based, OB; Object-Based using the L8 15-m panchromatic band, OBP).

Figure 8. Average producer’s (a) and user’s accuracies (b) for the four compared approaches. For descriptive purposes, classes are shown in descending order, from left to right, considering the four approaches’ average accuracy (Pixel-based, PB; Pixel-based including the image textural information, PBT; Object-Based, OB; Object-Based using the L8 15-m panchromatic band, OBP). Codes used for LULC classes are described in Table 1.

Figure 9. Average normalized importance of RF classification features for the four approaches (Pixel-based, PB; Pixel-based including the image textural information, PBT; Object-Based, OB; Object-Based using the L8 15-m panchromatic band, OBP). Codes used for LULC classes are described in Table 1.

Table 1. Selected LULC classes, their codes, and descriptions.

LULC Class	Code	Description
Broadleaf forests	BLF	Forest vegetation, dominated by different deciduous tree species, above all: beech, white oak, Turkey oak, and others.
Bare soil or rocks	BSR	Persistently non-vegetated areas characterized by bare soil or rocks.
Grasslands	GRS	Grassland vegetation dominated by grasses, forbs, and small shrubs in variable proportion, from dense and continuous to sparse and discontinuous, including both semi-natural secondary grasslands, up to ca. 1800 m. a.s.l., and primary grasslands at higher altitudes (above the tree limit).
Shrubs	SHR	Shrubs and scrublands mostly resulting from the abandonment of pastures and agricultural areas, from place to place dominated by brooms, juniperus, blackthorn, hawthorn, etc.; at higher altitudes, this class includes the natural potential vegetation of the subalpine belt, mainly dominated by dwarf juniper.
Coniferous	CNF	Coniferous stands, including both natural ones (i.e., Mugo Pine-dominated vegetation) and artificial plantations.
Ferns	FER	Pioneer dynamic stages, first steps of the successional processes occurring in abandoned pastures and agricultural areas, mostly represented by eagle fern-dominated vegetation [Pteridium aquilinum (L.) Kuhn subsp. aquilinum].
Sparsely vegetated areas	SVA	Areas characterized by highly sparse and discontinuous vegetation cover, with a remarkable presence of stones and outcropping rocky substrate.
Pastures	PAS	Rich pastures and hay meadows with high biomass and dense cover values.
Agriculture	AGR	Agricultural areas.
Built-up	BUP	Built-up areas.

Table 2. Spectral indices and related formulas used in the research.

Index	Formula	Author
BSI	$\frac{(S W I R 1 + R E D) - (N I R + B L U E)}{(S W I R 1 + R E D) + (N I R + B L U E)}$	A. Rikimaru et al. [65]
NDBI	$\frac{N I R - S W I R 1}{N I R + S W I R 1}$	Zha et al. [62]
NDVI	$\frac{N I R - R E D}{N I R + R E D}$	W. Rouse et al. [66]
GARI	$\frac{(N I R - (G R E E N - (B L U E - R E D)))}{(N I R - (G R E E N + (B L U E - R E D)))}$	A. Gitelson et al. [67]
CVI	$\frac{(N I R - (G R E E N - (B L U E - R E D)))}{(N I R - (G R E E N + (B L U E - R E D)))}$	M. Vincini et al. [68]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tassi, A.; Gigante, D.; Modica, G.; Di Martino, L.; Vizzari, M. Pixel- vs. Object-Based Landsat 8 Data Classification in Google Earth Engine Using Random Forest: The Case Study of Maiella National Park. Remote Sens. 2021, 13, 2299. https://doi.org/10.3390/rs13122299

AMA Style

Tassi A, Gigante D, Modica G, Di Martino L, Vizzari M. Pixel- vs. Object-Based Landsat 8 Data Classification in Google Earth Engine Using Random Forest: The Case Study of Maiella National Park. Remote Sensing. 2021; 13(12):2299. https://doi.org/10.3390/rs13122299

Chicago/Turabian Style

Tassi, Andrea, Daniela Gigante, Giuseppe Modica, Luciano Di Martino, and Marco Vizzari. 2021. "Pixel- vs. Object-Based Landsat 8 Data Classification in Google Earth Engine Using Random Forest: The Case Study of Maiella National Park" Remote Sensing 13, no. 12: 2299. https://doi.org/10.3390/rs13122299

APA Style

Tassi, A., Gigante, D., Modica, G., Di Martino, L., & Vizzari, M. (2021). Pixel- vs. Object-Based Landsat 8 Data Classification in Google Earth Engine Using Random Forest: The Case Study of Maiella National Park. Remote Sensing, 13(12), 2299. https://doi.org/10.3390/rs13122299

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Pixel- vs. Object-Based Landsat 8 Data Classification in Google Earth Engine Using Random Forest: The Case Study of Maiella National Park

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Methodological Framework

2.3. Data Composition

2.4. Classification

2.5. Accuracy Assessment

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI