1. Introduction
Informal small-scale alluvial gold mining, also known as placer mining, has major social and environmental impacts and has been at the heart of complicated armed conflicts in various parts of the world. It is distinct from subsistence mining as it utilizes large machinery to excavate soil and river sediment [
1]. When carried out on the riverbanks, it leaves large footprints of bare soil along with ponds of water that are utilized for on-site processing [
2,
3]. Such ponds are required to pump ore slurry and wash it through sluice boxes under pressure where the gold particles are collected. Prior to the 2018 mercury ban [
4], amalgamation was intensively used to improve the capture of the finest gold particles leading to major health hazards [
5]. This law was implemented for the mining sector, but it has not been well enforced, resulting in illegal mercury markets that supply illegal/informal mining [
6].
A small-scale placer mining activity is considered formal/legal when the operator obtains a mining title and a program of works (Programa de Trabajos y Obras—PTO). Accordingly, the small-scale mining activities are required to be less than 150 hectares and need strict measures of land recovery [
1]. Unfortunately, it is estimated that more than 70% of the gold production in Colombia is extracted from informal small- and medium-scale activities where the operators have not obtained legal permissions to do so [
7,
8,
9]. This situation has worsened with the increase in gold prices since the year 2000. Despite the major environmental and social impacts of these activities, the fact remains that traditional land surveys are very challenging for such remote and harsh areas as they lack suitable spatial or temporal coverage. Earth observation techniques can be an improved method to detect, map, and monitor these extractive activities and assess their impacts [
5,
10,
11,
12].
When utilizing optical spaceborne data, cloud coverage can be a hindering factor for analysis methods where cloud and cloud-shadow detection is essential prior to using the imagery. Unfortunately, the footprints of bare excavated areas are of relatively high reflectance; and along with water ponds, they comprise a mosaic of challenging terrains for cloud and cloud-shadow detection [
13,
14]. There are three major categories of clouds that affect imagery in different manners, namely cumulus, stratus, and cirrus clouds. Cumulus and stratus clouds, often referred to as dense clouds, are the lowest clouds. They have relatively high reflectance and can be easier to detect in satellite imagery than higher cirrus clouds that appear as detached filaments [
15]. Approaches to detect these dense clouds and their shadows can vary. For example, each satellite scene can be studied separately, i.e., a mono-temporal approach [
16,
17,
18,
19,
20,
21,
22,
23,
24,
25], or a time series of images is used to identify clouded pixels of relatively higher reflectance, i.e., a multi-temporal methodology [
26,
27]. On the other hand, any cloud shadows depicted in an image are projections of corresponding clouds, and thus, the direction of observations plays a large role in the location and geometry of the shadows [
28]. This cause-and-effect relationship between a cloud and its shadow is to be considered essential in their detection [
22]. Various cases have been reported regarding the challenges of relying only on spectral information in detecting cloud shadows where false positive detection can easily occur due to topographical features or water bodies [
13]. Accordingly, thermal data, textural characteristics, or geometric characteristics of cloud shadows have been utilized for improved detection [
23,
28,
29,
30]. Other approaches to monitor clouded areas involve the use of synthetic aperture radar (SAR) data, i.e., not affected by clouds, such as the data acquired by Sentinel-1 of the Copernicus program [
31,
32].
The advantage of using the Copernicus Sentinel-2 constellation of two satellites (S2A and S2B) is that its data are freely available and have a 10m resolution for various bands. The Multispectral Instruments (MSIs) are the sensors on-board of the satellites, with the first data acquisitions dating to 2016. The combined use of the two platforms allows a high revisit time, with an image over Colombia obtained every 5 days. MSIs provide images with thirteen bands. The central wavelength (λ) and bandwidth of each band per sensor are detailed in
Table 1. Depending on the band (B), Sentinel-2 data can have a spatial resolution of 10m, 20m, or 60m [
33].
A popular source of atmospherically corrected Sentinel-2 (S2) data for Colombia is the Copernicus hub (
https://scihub.copernicus.eu/) that utilizes Sen2Cor, a semi-empirical mono-temporal model for radiometric and atmospheric correction. Using Sen2Cor, the L1C level of the data, i.e., the top of the atmosphere radiance, is transformed into Level L2A, which corresponds to surface reflectance. Cloud (dense and cirrus clouds) and shadow detection are available for L1C and L2A products [
19,
34,
35]. For L1C data, dense cloud detection utilizes B2 (490nm) and with the help of shortwave infra-red (SWIR) B10 (1375 nm), B11 (1610 nm), and B12 (2190 nm), the false inclusion of snow is avoided. B10 is also used for the detection of cirrus clouds as their high altitude can be detected using a band with high atmospheric absorption. Finally, filters applied on detected clouds are used to remove isolated pixels and to fill gaps within clouds [
35]. On the other hand, cloud detection for L2A products utilizes several steps of threshold filtering using indices that involve land cover to avoid detecting false cloud pixels in regions of possible false detection, such as areas of bare soil [
36]. Unfortunately, the cloud detection approach of Sen2Cor has been reported to result in the unsatisfactory detection of dense clouds and their shadows [
14,
24,
37], and has been shown to result in false positives in small-scale mining areas [
12].
This paper aims to provide improved cloud and shadow detection in an approach that is simple, efficient, and based on freely available tools. It aims at improving cloud and cloud shadow detection in the context of mapping small-scale mining where the areas of interest are bare soil and water ponds. This procedure consists of two consecutive machine-learning steps. First, a supervised classification detects candidate clouds and shadows; second, the solar-cloud-shadow-sensor geometry and a causality effect between cloud shadows and clouds are considered to reduce shadow commission error. There have been already various methods developed that include the reduction of cloud-shadow false positives. One “universal” method that can be used for Sentinel-2 data considers an object-based image analysis approach for shape spatial-matching of cloud and cloud–shadows [
22]. Another approach developed for MODIS data considers a geometry-based tool to detect potential shadows followed by classification to match the two outputs [
13]. Other geometry-based approaches have been tailored for specific sensors that include thermal bands [
28,
38].
This paper proposes a simple pixel-based approach that provides a high-quality identification of clouds and their shadows for Sentinel-2 in the context of small-scale mining in Colombia. This work aims to efficiently provide a suitable tradeoff between omission errors leading to failure in excluding contaminated pixels and commission errors that result in masking out clear pixels. Although the methodology was tailored for the setting of the study area in the context of small-scale mining, it is scalable and can be a solid basis to develop a more generalized approach. The methodology is tested over an intensively excavated region through a mono-temporal approach due to the highly dynamic characteristics of the excavated areas and the rapid landcover change that needs to be depicted. A validation of the results using images acquired in different seasons was carried out on a well-studied pilot site in the vicinity of the town of El Bagre [
5]. The success of this approach is a milestone for time series analysis of land cover around mining sites that will lead to an early warning system about the sprawl of excavations, especially in the vicinity of protected or sensitive areas. Such important output is to be shared with stakeholders through MapX (
https://mapx.org), an online information and engagement platform that would allow the consolidation of data, analysis, and spatial visualization [
39]. MapX was developed by the United Nations Environment Program (UNEP) and UNEP/ GRID-Geneva (
https://unepgrid.ch).
5. Limitations and Future Work
While cloud dilation is not considered in this work, it can be a suitable approach for Sentinel-2 data to include fuzzy cloud pixels in cloud masks and to overcome parallax errors [
14]. An automated cloud dilation approach will be considered in the future to obtain an improved exclusion of pixels affected by clouds.
The approach illustrated an efficient improvement to cloud and cloud-shadow detection for Sentinel-2 using freely available tools. However, the approach also has its limitations. When a true shadow is located in a relatively dark area and is classified as a shadow along with its surroundings in one geometry, the entire geometry is retained as a shadow after the geometry-based improvement. Thus, those commissions cannot be excluded. Furthermore, the matching in the second iteration can result in commission errors in the shadows when candidate shadows are located in between a couple of matching cloud and shadow. Yet, all these potential drawbacks occur around areas where true cloud and shadow contamination exist, thus limiting the area of uncertainty in the results and leaving room for localized refinement of the methodology.
Another limitation of the presented methodology is that it is intended for relatively non-rugged terrain and relatively spatially homogeneous meteorological conditions where one representative hemp for cumulus clouds is considered. As such, additional considerations are needed for topography and potential micro-climates that can impact the cloud height with respect to the ground surface. However, the approach is scalable as it can be adjusted to allow the search for multiple representative hemp through considering local maxima for hemp when considering heterogenous areas. These aspects can be considered in the future when needed for other study areas.
As hemp is an empirical measure based on surface reflectance values, it is of great interest to analyze its correspondence to physical cloud characteristics. A future prospect of the work is to assess this measure’s link to cloud-top and cloud-base heights (thickness) at various scenarios of cloud-top ruggedness. This would require carrying out an analysis around areas where meteorological data are available or through the use of satellite data that allows for the extraction of cloud 3D geometry, such as geostationary data.