Lithological Mapping of Kohat Basin in Pakistan Using Multispectral Remote Sensing Data: A Comparison of Support Vector Machine (SVM) and Artificial Neural Network (ANN)

Elahi, Fakhar; Muhammad, Khan; Din, Shahab Ud; Khan, Muhammad Fawad Akbar; Bashir, Shahid; Hanif, Muhammad

doi:10.3390/app122312147

Open AccessArticle

Lithological Mapping of Kohat Basin in Pakistan Using Multispectral Remote Sensing Data: A Comparison of Support Vector Machine (SVM) and Artificial Neural Network (ANN)

by

Fakhar Elahi

¹,

Khan Muhammad

^1,2,*

,

Shahab Ud Din

¹

,

Muhammad Fawad Akbar Khan

¹

,

Shahid Bashir

^1,3

and

Muhammad Hanif

⁴

¹

Intelligent Information Processing Lab, National Centre of Artificial Intelligence, University of Engineering and Technology, Peshawar 25000, Khyber Pakhtunkhwa, Pakistan

²

Department of Mining Engineering, University of Engineering and Technology (UET), Peshawar 25000, Khyber Pakhtunkhwa, Pakistan

³

Department of Electrical Engineering, University of Engineering and Technology (UET), Peshawar 25000, Khyber Pakhtunkhwa, Pakistan

⁴

National Centre of Excellence in Geology, University of Peshawar, Peshawar 25000, Khyber Pakhtunkhwa, Pakistan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(23), 12147; https://doi.org/10.3390/app122312147

Submission received: 17 October 2022 / Revised: 14 November 2022 / Accepted: 16 November 2022 / Published: 28 November 2022

(This article belongs to the Special Issue Applications of Machine Learning on Earth Sciences)

Download

Browse Figures

Versions Notes

Abstract

:

Artificial intelligence (AI)-based multispectral remote sensing has been the best supporting tool using limited resources to enhance the lithological mapping abilities with accuracy, supported by ground truthing through traditional mapping techniques. The availability of the dataset, choice of algorithm, cost, accuracy, computational time, data labeling, and terrain features are some crucial considerations that researchers continue to explore. In this research, support vector machine (SVM) and artificial neural network (ANN) were applied to the Sentinel-2 MSI dataset for classifying lithologies having subtle compositional differences in the Kohat Basin’s remote, inaccessible regions within Pakistan. First, we used principal component analysis (PCA), minimum noise fraction (MNF), and available maps for reliable data annotation for training SVM and (ANN) models for mapping ten classes (nine lithological units + water). The ANN and SVM results were compared with the previously conducted studies in the area and ground truth survey to evaluate their accuracy. SVM mapped ten classes with an overall accuracy (OA) of 95.78% and kappa coefficient of 0.95, compared to 95.73% and 0.95 by ANN classification. The SVM algorithm was more efficient concerning computational efficiency, accuracy, and ease due to available features within Google Earth Engine (GEE). Contrarily, ANN required time-consuming data transformation from GEE to Google Cloud before application in Google Colab.

Keywords:

ANN; SVM; lithological mapping; machine learning; remote sensing

1. Introduction

Remote sensing sensors record reflected or absorbed electromagnetic (EM) spectra of various wavelengths to identify objects according to their typical response due to varying chemical and physical properties. Each rock type may exhibit distinctive spectral absorption/reflection in the relevant electromagnetic spectrum, according to their mineral compositions [1]. Different minerals have discriminative responses in near-infrared (NIR), short-wave infrared (SWIR), and thermal infrared (TIR) spectral wavelengths [1,2,3]. Mapping lithological units may provide information about the spatial distribution of various rock units and their structural occurrences to interpret and map potential zones of mineralization [4,5,6,7,8,9,10]. Hence, researchers utilize these spectral bands to explore and identify different minerals/lithologies through different techniques applied to remote sensing (RS) data integrated by field surveys and lab geochemical/spectral data [3].

Traditional remote sensing, e.g., band ratios, MNF, decorrelation stretching, and principal component analysis (PCA) [5,6,9,11,12,13,14,15,16,17], has been widely applied to mainly ASTER [5,9,10,12,16,17,18,19,20,21], Landsat 8 OLI [5,10,14], and Sentinel-2 [11,12,13], or their fused combination, to map different lithologies [10,22]. Researchers have reported better results for the use of Sentinel-2 data after the application of these traditional techniques [11]. However, creating geological maps through remote sensing data is still challenging, with complexities involving sub-pixel-level nonlinear mixing of minerals response [23]. In addition, these techniques are time-consuming, requiring considerable manual interpretation and expert knowledge [24]. The mixed albedo from different lithologies, instrumental limitations, and environmental factors invite robust data mining machine learning (ML) algorithms and deep learning (DL) algorithms to reveal detailed geological information [4,8,25,26,27,28,29,30].

Machine learning algorithms have been broadly classified into (1) supervised, (2) unsupervised, (3) self-supervised, and (4) reinforcement learning algorithms [31]. Supervised algorithms, e.g., naive Bayes, random forests, SVM, ANN [32], and DL algorithms [31], such as convolution neural network (CNN) and recurrent neural network (RNN), are used to predict the outputs matching given targets. Unsupervised learning, e.g., k-nearest neighborhood, fuzzy-c-means, PCA, and ICA, learn by finding interesting features in data through reconstruction, transformation, or classification without considering matching of input with targets [32]. The self-supervised class of algorithms match the targets with output data; however, the targets are self-generated from the input data instead of given targets [31]. Reinforcement learning generally involves information retrieval from the changing environment and learning to take the best action, reinforced by a reward maximization mechanism [33]. ML algorithms are a widely used alternative to other techniques for robust and automated classification [3,4,6,30,34]. However, the ML algorithms’ accuracy depends on data labels that can be defective due to lithological complexity or lower spatial resolution [35]. Therefore, unsupervised techniques such as clustering, PCA, and decorrelation stretching (DS) are used in the preprocessing stage to improve the reliability of data annotation before applying the ML algorithm [6,28].

The remote sensing dataset’s spatial/spectral resolution and available bands are crucial for selecting multispectral datasets to minimize data noise [24,35]. The freely available 10 m spatial resolution blue (B2), green (B3), red (B4), and NIR (B8), and 20 m SWIR1 (B11) and SWIR2 (B12) bands of Sentinel-2 MSI dataset in Google Earth Engine (GEE) facilitate economic means of lithological mapping [34,36]. GEE contains observations from aerial imaging systems and multispectral satellite data related to various remote sensing (RS) domains [6,34,37]. The cloud computing feature has reduced computational time and enhanced observation efficiency; in addition, a programming interface is provided for access to data processing features and ML algorithms.

Various lithological formations in the Kohat–Karak regions of Khyber Pakhtunkhwa of Pakistan contain distinctive alluvium, limestone, Jatta Gypsum, and six other lithologies (Figure 1 and Figure S1 [38]). These six lithological formations in the Kohat Plateau, Pakistan, are outlined in Table A1 (Appendix A). Subtle compositional differences in these formations comprise various proportions of sandstone, shale, and mudstone [39], resulting from Himalayan molasse depositions. However, the lithologies of this region are difficult to map through conventional mapping practices due to the complexly deformed, inaccessible, and rugged terrain. For example, a previous study in the same area used conventional remote sensing (RS) techniques for mapping the gypsum lithological unit only [19]. The freely available 10 m spatial resolution blue (B2), green (B3), red (B4), and NIR (B8), and 20 m SWIR1 (B11) and SWIR2 (B12) bands of Sentinel-2 data have rarely been explored for mapping lithologies in such terrain.

Recently, Bachri [4] reported 85% accuracy from SVM classification of 10 lithological classes using Landsat 8 OLI with digital elevation model (DEM) and geomorphometric attributes of ALOS/PALSAR. Some studies reported SVM 83.16% [22] accuracies using Sentinel-2 applied to fused Sentinel-2+ASTER data. The RF algorithm has also been considered as the best model with reported accuracies of 91% [40] after application to Sentinel-2 and 85.75% [41] with application to a fusion of Sentinel-2+ASTER+DEM data. In other case studies, the maximum likelihood classifier (MLC) has been reported as the best, with accuracies of 70% [42] and 76% [43] applied to Sentinel-2 data. Therefore, different ML algorithms perform better in different lithological formations for a diverse choice of satellite data. Additionally, the computational cost for such studies has rarely been reported previously. This paper aims to present a lithological mapping solution with the following objectives:

Apply better means of data annotation to obtain higher map accuracy than previous ML lithological mapping solutions at lower computational cost and higher spatial resolution.
Present a novel system architecture to improve the existing geologic map of the Kohat Plateau using Sentinel-2 MSI datasets by (1) extracting training data from PCA, MNF, and previous maps for better annotation, and (2) comparison of SVM vs. ANN ML lithological classification.
Obtain a medium spatial resolution (1:30,000), high-accuracy ML map for the region with subtle compositional differences in the region as a prospecting tool for further mineral exploration.

2. Materials and Methods

2.1. Geology of the Study Area

The Kohat Plateau within the study area (Figure 1) is characterized by a blend of lithologies owing to a broad spectrum of depositional environments from carbonate platform to marginal marine and again to carbonate platform in the Eocene. An unconformity follows in Oligocene continental settings from the Miocene onwards, as shown in Table A1 (Appendix A) and Figure S1 [38]. The region mainly contains the deformed sedimentary rocks from the Paleocene to Pliocene sequences formed due to the collision of the Eurasian and Indian plates. Tectonically, the Kohat Plateau falls in a compressional regime due to a tectonic collision between the Indian and Asian plates northward of the MBT [44]. The compressional tectonics resulted in thrust sheets in the Kohat Plateau, which predominantly exposes Paleogene–recent strata [45,46]. As a result, thrusts, broad synclines, tight anticline structures, and tight symmetrical folds striking in the east–west direction are found. The fluctuating depositional environments in the Eocene indicate the interplay of tectonics and climate, which becomes predominantly continental from the Miocene onwards in response to Himalayan orogeny [47]. A detailed study of the structural geology and stratigraphy of the Kohat region is discussed by Ali et al. [48]. Apart from Quaternary era surface alluvium depositions, all the tertiary era depositions are ordered from oldest to most recent: Jatta Gypsum, Mami Khel Clay [49], Kohat [50] (Eocene); Murree [51], Kamlial [52] (Miocene); Chinji [52], Nagri [52], and Dhok-Pathan (Pliocene) formations [53]. Jatta Gypsum is the oldest formation, with evaporite origin overlaying the Bahadur Khel salt deposition; Mami Khel/Kuldana formations are the early clasts of erosion from the Himalayan mountains in the Eocene era. The purple-reddish-brown shale containing the high-iron Murree Formation (Miocene) overlays the Kohat Formation. The Kamlial Formation (Miocene) is mostly greenish-grey sandstone, similar to the Nagri Formation but with a relatively lower shale. The Chinji Formation has claystone lumps with pointed heaps of sandstone/silty clay reddish clasts bearing high ferric oxide content (hematite). The Nagri Formation consists of micaceous sandstone with 50% sandstone and 50% shale. The Dhok-Pathan Formation contains 70% sandstone and 30% clay shale. The various formations of the Siwalik Group are distinguished by gross sandstone percentages, with the Chinji Formation having less than 50% sand and a greater amount of mudstone, the Nagri Formation more than 50% sand, and the Dhok-Pathan Formation, again, less than 50% sand [47]. The differentiation of these formations would rely on compositional variations of quartz, feldspar, mudstone, and heavy minerals and the presence of mica in lithofragments.

The semiarid climate with outcrop exposures of variable lithological colors and hydrocarbon potential of the area makes the Kohat Plateau favorable for lithological mapping using spectral remote sensing data. However, for automated mapping of these formations through remote sensing, robust ML algorithms and suitable datasets are required to distinguish between subtle variations due to varying colors (due to associated minerals) and compositions of sandstone, shale, and conglomerates (Table A1 in Appendix A).

2.2. Multispectral Data and Google Earth Engine

GEE is a cloud-based platform for processing large-scale geospatial data for mineral mapping, environmental monitoring, and analysis. The GEE provides free and easy access to Landsat and Sentinel satellite datasets [36,37]. Due to GEE development, the research enthusiasm in remote sensing and geospatial multispectral data science has increased [34]. GEE is a free-to-use platform providing access to:

Multispectral remote sensing dataset of various satellites, such as Landsat, Sentinel, and Modis, along with an explorer web app and other ready-to-use products.
Different AI/ML algorithms with high-speed parallel processing using Google computational infrastructure.
The two most popular programming languages are JavaScript and Python, supporting (APIs) application.
Programming interface with the development environments.

These GEE core features enable users to visualize and process large-scale multispectral geospatial data with high-speed supercomputers using available machine learning algorithms. Generally, SWIR and TIR bands are considered the most important for distinguishing lithologies [1]. The Sentinel-2 MSI datasets have a spatial resolution of 20 m for SWIR bands but lack TIR bands. In contrast, Landsat 8 OLI datasets contain TIR bands with a spatial resolution of 100 m and therefore have limited spatial detail, i.e., 1:100,000 [54]. Thus, the spectral responses in the TIR band may be diluted due to the mixed response from features due to its limited temporal, spatial, and spectral resolutions [17,23].

Additionally, vegetation and soil cover further make it difficult to map the deposits [24]. The Sentinel-2 satellite with the MSI on board [55] has a high revisit frequency, and mission coverage provides local, regional, national, and international data. In addition, due to the relatively higher spatial resolution SWIR bands, Sentinel-2 is generally more suitable for mineral exploration and lithological mapping [2]. However, robust ML algorithms must be required to deal with limited data (without TIR bands) for identifying various rock types.

Limestone shows spectral variations in the SWIR range (2.10–2.30 μm) and TIR range (10.25–11.65 μm) [17,21]. Gypsum shows a wide range of varying absorption features in SWIR ranges (1.2–1.38 µm, 1.61–1.75 µm, and 2.21 µm) depending upon the low-frequency vibrational modes of associated crystal molecular water, O–H stretching fundamentals, and combinations of H–O–H bending [9]. Similarly, shale displays absorption features at 1.40 μm attributed to OH/H2O stretches, 1.90 μm related to H₂O stretches, and 2.20 μm due to a combination of the OH-stretching fundamental with Al–OH bending mode [10]. Therefore, Sentinel-2 bearing SWIR bands and high spatial resolution can be used for distinguishing these formations through robust ML classification algorithms, such as SVM and ANN [26].

2.3. Supervised Classification Algorithms

A classification function maps the input data (features) to the output class or (target labels) by minimizing the prediction error, often learning the complex input–output relationship patterns. Inputs are shown in the form of n-vectors {X1, X2…Xn} while outputs are represented in the form of finite k class labels {Y1, Y2…Yk}. Datasets are divided into training and testing sets, and models are trained using training data, while testing data are used for cross-validation to evaluate the performance of trained machine learning models [32]. SVM and ANN supervised classification algorithms are the most widely used algorithms in ML applications, providing valuable insights through cloud computing involving geospatial data.

2.3.1. Support Vector Machine (SVM)

The SVM is widely applied in mineral exploration, especially in processing remote sensing data [6,26,29,56]. As defined by Vapnik [57], SVM can solve a quadratic optimization problem by creating nonlinear decision boundaries in high-dimensional variable space [58]. According to basic SVM theory, many hyperplanes can split different classes for a nonlinearly separable dataset, including points from two classes. Therefore, only a subset of “k” training samples, known as support vectors, are used to identify a hyperplane that best divides classes (i.e., the decision boundary). The optimal decision boundary between different classes is defined as those with the maximum margin M =

\frac{2}{‖ w ‖}

(distance) between the support vectors. SVM identifies M in nonseparable linear instances while considering a cost parameter C, which provides a penalty for misclassifying support vectors. The objective function must be adjusted to include this penalty component for wide-margined decision boundaries with misclassified support vectors, as shown below.

Minimize

\frac{‖ w ‖}{2} + C \sum_{n = 1}^{k} ε_{n}

(1)

Subject to the hyperplane boundary conditions

y_{n} (w . x + b) \geq 1 - ε_{n} ε_{n} > 0 and n = 0, 1, 2, 3 \dots \dots k

(2)

where

‖ w ‖

is the Euclidean norm of

w

, the hyperplane orientation vector that controls the hyperplane’s orientation; b is the hyperplane’s offset from the origin, and

ε_{n}

is the positive slack variable indicating the error distance between nth misclassified support vector and its marginal hyperplane. SVM uses an implicit transformation

φ (x)

of input data “x” for cases where classes are not linearly differentiable [59].

Z (x i, x j) = φ (x i) . φ (x j)

(3)

Equation (3) provides the inner product kernel

Z (x i, x j)

of pairwise positions in variable space compared to input variables. Kernel functions allow SVM to handle nonlinear relationships efficiently by creating linear hyperplanes that separate nonlinearly separable support vectors by projecting samples from the original d-dimensional space into potentially infinite-dimensional kernel space. In this situation, the decision function’s form is expressed as:

M : M (x) = s g n (w . Z (x i, x j) + b)

(4)

The choice of kernel functions is critical for SVM training and classification accuracy. Different kernel types available for SVM are polynomial kernel (PL), radial basis function (RBF) kernel, and sigmoid kernel (SIG). The polynomial with degree 1 is the simplest kernel; hence, it learns patterns faster than other kernels. The two most essential parameters other than kernel types that affect the performance of SVM are the penalty parameter C and the gamma coefficient. If the penalty parameter limits the error level to be accepted in the training data, the gamma parameter controls the degree of nonlinearity of the SVM model. For example, a very high value of the cost parameter C results in a complex margin, which reduces the training error. In contrast, a small value of the cost parameter creates a large margin, resulting in a significant error in the training data [60].

Google Earth Engine’s built-in SVM classifier was used for lithological mapping. The training data selected for training the ML algorithm were directly fed to SVM. The most critical parameters that affect the performance of SVM are the kernel type, cost parameter (C), and gamma coefficient [4]. The parameters were varied randomly using trial and error to check the algorithm’s performance. Different kernels in combination with these cost (penalty) parameters and gamma coefficient were tried. For remote sensing data, the gamma parameter values were always set to the inverse of the number of bands used in the study [28]; therefore, since there were six Sentinel-2 MSI bands, the gamma parameter was set to 1/6.

2.3.2. Artificial Neural Network (ANN)

The ANN classifier is an artificial intelligence (AI) algorithm that mimics how humans classify patterns, learn tasks, and solve problems [61]. The basic architecture of an ANN consists of networks of primitive functions that can receive numerous weighted inputs and are rated on how well they discriminate between the classes in training data (Figure 2).

Models differ depending on the primitive functions and network configurations used [32,62]. The network connection weights are changed, and convergence continues until the error reduction between iterations has reached a decay threshold [26,63]. The important hypermeters that affect the performance of ANN are the activation function, loss function, optimizer, hidden layer, number of nodes, and regularization layers [32]. A deep neural network (DNN) represents machine learning when the system has some level of complexity and uses many layers of nodes to derive high-level functions from input information [64].

An ANN can extract underlying patterns in a data collection without prior information (e.g., a deposit model) and operate with sufficient accuracy even when the data are noisy [25]. The features of Google Cloud and TensorFlow Keras API in Google Colab were used to create an ANN model since GEE had no provision to apply an ANN model. First, we authenticated GEE with Google Colab; the training data were imported from GEE to Colab by uploading them onto the cloud in TF file format to enable working in the TensorFlow ML platform. Then, we created an ANN model by calling TensorFlow Keras API. Keras is a Python-based deep learning API that runs on top of TensorFlow [31]. It is used to create a deep learning model to allow quick experimentation and results generation [31]. A Keras sequential model with one input, four hidden, and an output layer (1–4–1) was created. After each dense layer, a normalization layer was used to prevent model overfitting. A normalization technique used was layer normalization, which finds normalized values in each layer of the ANN model. The normalization statistics of the summed inputs to the neurons within a hidden layer are directly estimated with this method [65]. An ReLU (rectified linear unit) activation function was used to produce input for the next neuron, which is defined as f(x) = max (0, x) for fast computing and learning of the model. We used the softmax activation function at the output layer, a more generalized form of the sigmoid function, which works well with multiclassification problems [66]. To check the performance of the ANN model, we used the categorical cross-entropy error/loss function for faster training while computing the error [66] and better generalization since it generalizes discrete classification problems efficiently [67]. A comparative study of different optimizers used in ANN is discussed by Mustapha et al. [54]; the most widely used Adam and SGD were used to train the model.

2.3.3. Accuracy Measures

The overall (OA) producer (PA) and user (UA) accuracies from the confusion matrix and the kappa coefficient were used to evaluate the performance of the ANN model and SVM classifier [68]. The OA is the ratio of accurate pixels in the error matrix to the total pixels present in the error matrix, i.e., pixels associated with class Y that are not correctly classified as class Y. The UA reports the algorithm’s reliability as commission errors, i.e., pixels classified as class Y that are not associated with class Y. The PA includes omission errors relating to specific classes, i.e., the number of instances where an algorithm wrongly classifies a pixel as Y. The kappa coefficient is a statistical measure that shows how well a classified map agrees with reference data using random classification analyses as the probability of agreements arising by chance [69]. The kappa coefficient value varies from 0 to 1; values closer to 1 suggest little ambiguity in a pixel’s class identity, and values near 0 indicate high classification uncertainty. Field visits for ground truth observations in the study area were also conducted to validate the results generated by these ML algorithms.

3. Mapping Lithologies in the Kohat Plateau Using SVM and ANN

A schematic representation of the methodology is presented in Figure 3, starting with processing Sentinel-2 MSI data using the GEE platform. Atmospherically corrected multispectral Sentinel-2 MSI data were taken from the GEE dataset. Date (only images from 2016 to 2020), cloud (pixels < 5% cloud), and vegetation (pixels < 5% vegetation) filters were applied to select pertinent data in the study area. In addition, we used a median filter which sorts all the pixels in ascending order and selects the median reflectance value for each band. Median filtering is a nonlinear technique that can help preserve the sharp features in an image by filtering the noise.

3.1. Spectral Features of Lithologies in the Region

The reflectance values of all the training samples of Sentinel-2 data sampled from GEE for each class are shown in Figure 4. The plot shows the normalized value, dividing the samples’ reflectance values with peak reflectance values. The highest peak value of 1 is shown by alluvium at 1.61 µm, while water shows the high absorption/smallest reflectance value of 0.05 at 2.2 µm, as compared with other classes. The normalized reflectance value of each class increased from 0.50–0.83 µm; all formations have a positive gradient (slope) from 0.83–1.61 µm, except negative gradient of water in that range. The Kohat Formation also slightly declines the reflectance value from 0.56–0.66 µm. There is a sharp decline in the reflectance value of the Jatta Gypsum Formation after 1.61 µm. The overall average reflectance value of Kamlial Sandstone, Dhok Patan, alluvium, and Jatta Gypsum was higher than the Nagri, Kohat, Muree, Mami Khel, and Chinji formations. These reflectance values could not distinguish between the lithologies of interest and therefore invited the use of machine learning algorithms for better delineation and further analysis.

3.2. Preprocessing of Data

The ML task consists of (a) preprocessing, (b) training, and (c) testing the model. Data preprocessing is necessary to transform available data into a format containing only relevant information related to the problem [32]. For example, MNF [5] is commonly used for denoising remote sensing data. It converts a noisy data cube into output images with gradually increasing noise levels. As a result, the MNF output images gradually decrease the image quality. MNF converts a linear transform in two steps: (1) Noise whitening is performed, i.e., the noise in the data is rescaled and decorrelated using a noise covariance matrix. Thus, the noise has no band-to-band correlations and unit variance. (2) Standard PCA transform is performed on the noise-whitened data (Figure 3). In the preprocessing stage, PCA is a widely accepted statistical technique that transforms raw multivariate, often intercorrelated, dataset variables into a new set of uncorrelated variables represented by a group of principal components. The first principal component contains the most variability in the data, and each subsequent component has a lower representation of variance in data [32]. PCA enhances spectral characteristics of surface material by minimizing the irradiance effects that dominate all bands, removing redundant data of different bands, confining information within a few crucial bands [18]. These bands are retained as input to the ML algorithms for further analysis. Training samples were selected carefully by analyzing the vegetation (Figure 5) and comparing (1) polygons for all classes (Figure 5) from the previous geological map (Figure S1), (2) the chosen principal components (Figure 6), and (3) MNF (Figure 6). The NIR (B8) and SWIR (B11 and B12) bands of Sentinel-2 were used to classify lithological units. The total area of the study region is 6671.8 km², 105.45 km along easting and 63.27 km along northing.

Samples from the selected polygons were converted into 30 m pixels using GEE built-in function to assign relevant labels before classification, as shown in Table 1. Out of 24,340 pixels, 70% were selected randomly for training, while 30% were used as test dataset to evaluate the machine learning model.

4. Results

Hyperparameters used for training the respective algorithm are presented in Table 2, and were obtained through trial and error. The most influencing hyperparameters that affected the performance of SVM were kernel type, cost parameter, and gamma value. The SVM reported the best performance using a polynomial kernel with degree one, cost value of 0.02, and gamma value of 0.16. SVM shows the best result at a gamma value of 1/6, with the polynomial kernel of degree 1 and the cost parameter having a value of 0.02; consequently, an accuracy greater than 95% was achieved for both training and evaluation datasets (Table 3). For ANN, we used an Adam optimizer with a learning rate of 0.0001; the SGD optimizer was nearly three times more computationally expensive during the training compared to the Adam optimizer.

The same training samples, collected after preprocessing the data, were fed to both algorithms. Both algorithms reported good results, with training and validation accuracy of more than 90%. Table 3 compares the training accuracy, validation accuracy, and kappa coefficient of SVM with ANN. Both algorithms agree with the training dataset by having a kappa coefficient value of 0.95. SVM offers the training and testing accuracy of 95.98% and 95.61%, similar to respective accuracies of 94.48% and 95.73% for ANN.

Table 4 and Table 5 report the PA and UA of ANN and SVM for all the lithological units. The UA for all lithologies was greater than 90% for the ANN model except for the Mami Khel and Kamlial formations, with 88.1% and 88.6% accuracies. The highest UA shown by the ANN model was 99.8% for water and 99.5% for Jatta Gypsum. The PA of ANN was greater than 90% for all lithologies except for the Nagri and Muree formations, with 88.7% and 77.4% accuracies. ANN showed the highest PA, 99.5% for Jatta Gypsum, and 99.1% for the Kohat Formation. The SVM UA for all lithologies was greater than 90% except for the Murree Formation, with a UA of 84.9%. The highest UA reported by SVM was for water, i.e., 100% and 99.6% for Jatta Gypsum. The PA of SVM was greater than 90% for all lithologies, reporting the highest for Jatta Gypsum with 99.3% accuracy, except for the Muree Formation with 81.9% accuracy. The overall accuracy (OA) was 95.78 for SVM and 95.73 for ANN.

Figure 7 shows the classified map generated by SVM and ANN, with the Jatta Gypsum deposit highlighted in orange. A previous study within the region [19] only focused on Jatta Gypsum outcrops; this study mapped nine other lithologies, including Jatta Gypsum. Ground validation of the mapped area was carried out during a field visit along a north–northeast to south–southwest transect (Figure 8). The lithostratigraphic units observed in the field were: Bahadur Khel Salt, Jatta Gypsum, Mami Khel Clay, Kohat, Murree, Kamlial, Chinji, Nagri, and Dhok Patan, in conformity with ML maps. Jatta Gypsum overlaid the Bahadur Khel Salt and conformed with the previously published maps and ground exposures. Kohat Limestone capped most outcrops in the mapped area. The Murree Formation is thin in the mapped area and has limited ground exposure. Thick deposits of the Siwalik Group and the Chinji, Nagri, and Dhok Patan formations were observed along the transect segment between Banda Daud Shah and Karak. The Chinji Formation was overlain by the Nagri Formation, which is overlain by the Dhok Patan Formation. Some major differences in the ANN vs. SVM classification map are evident along the eastern extents between the Kamlial, Nagri, and Dhok Patan formations and minor deviations near the southwestern end of the Jatta Gypsum Formation contacts with the Kamlial and Chinji formations.

5. Discussion

Previously published remote sensing map for this region [19] only mapped the Jatta Gypsum Formation using ASTER data. The challenging problem of mapping other lithologies of high mineralogical similarities was overcome successfully through AI algorithms using relatively limited bands of Sentinel-2 MSI data. Improvements to the previously published geological map (Figure S1) were reported regarding the accuracy and lithological detail, as shown in Figure 7 and Figure 8. We can see that the PA of both algorithms has some excellent results for each class except for the Muree Formation. This may be associated with the fact that this formation was thinly exposed on the ground compared to other formations and may have been affected by atmospheric effects, vegetation cover, the spectral and spatial resolution of the image, heterogeneity of the chemical and mineralogical composition of the rock at the sub-pixel level, and soil cover. The UA of both algorithms was greater than 90% for all rock types and had an excellent agreement with the previously published GSP maps. SVM had better accuracies for all formations compared to ANN and can therefore be considered a more generalized representation of the area. In addition, the high computational efficiency of GEE and high-resolution Sentinel-2 data are further advantages. The SVM classifier took five minutes to train and display results. Moreover, these accuracies and cost-effective solutions were better than the previously reported case studies of recently reported ML-based lithological mapping strategies [22,40,41,42,43,70,71]. ANN with four hidden layers, layer normalization with each hidden and input layer, categorical cross-entropy loss function, and Adam optimizer showed the best results with a training accuracy of 94.48%. However, the training time for this ANN model was 3 h. ANN was time-consuming since there is no built-in library for ANN in the GEE, and therefore it required TensorFlow Keras API through Google Colab. By linking GEE to Colab and uploading training data to Google Cloud in TF format, the predictions were uploaded back to the GEE for visualization. This process took 8 hrs. of data transfer for the same data size, compared to 5 min of SVM.

The region is a high hydrocarbon- and minerals-producing area; therefore, such automated mapping tools would further lead to devising future strategies as an improved tool for further exploration beyond this region. The study can be extended further using explainable AI [72] to assess the learning of the model in the context of the response of input variables (bands) to the outputs (various formations). Further, the surface maps can be combined with AI-based spatial estimation models [72] applied to subsurface geochemical and geophysical data to develop 3D geological models of potential mineralized zones. The approaches will enable the mineral exploration and mining industry to achieve Industry 4.0 [73] through IoT [74,75,76] and blockchain [77] solutions for secure data sharing within the mining industry.

6. Conclusions

This paper compares the two most popular supervised ML algorithms (SVM and ANN) for lithological mapping of different rock types using RGB and SWIR multispectral remote sensing data of Sentinel-2. Training samples were collected using previously available geological maps and unsupervised techniques were applied to Sentinel 2 data for data annotation. A map of nine different lithological units and water was generated with higher accuracy and detail than the previously reported conventional band ratios/PCA map of the Jatta Gypsum applied to ASTER satellite RS data. The results show that both the algorithms map these lithologies with >95% OA and a kappa coefficient of 0.95. The accuracies also exceeded the recently reported ML lithological mapping studies; therefore, SVM with the lower computational cost is the best algorithm for this case, learning the features from limited RGB, NIR, and SWIR bands of Sentinel-2 multispectral data. In the case of ANN, the transformation of GEE data to Google Colab took 8 hrs + 3 hrs in training and, therefore, is a time-consuming process. The potential of Sentinel-2 data has been reinforced with the strength of the SVM classifier and better annotation, which can be extended to differentiate similar lithologies beyond the region of interest with high accuracy and lower financial and computational cost. In the era of Industry 4.0, the work can be further extended to lineaments mapping combined with subsurface geophysical features to develop an automated 3D geological model with AI-based spatial estimation techniques.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app122312147/s1, Figure S1: Geol Map of Kohat Plateau.

Author Contributions

Conceptualization, K.M., S.B. and F.E.; methodology, K.M., M.F.A.K. and F.E.; software, M.F.A.K., F.E., S.U.D. and K.M.; validation, M.H., K.M. and F.E.; formal analysis, K.M. and F.E.; investigation, S.U.D. and F.E.; resources, K.M. and S.B.; data curation, M.F.A.K., S.U.D. and F.E.; writing—original draft preparation, F.E. and K.M.; writing—review and editing, K.M., S.B. and M.H.; visualization, F.E.; supervision, K.M. and S.B.; project administration, K.M.; funding acquisition, K.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Higher Education Commission, Pakistan, grant number “National Centre of AI” and the APC was funded by the authors.

Data Availability Statement

Data supporting reported results can be found at the Google Earth Engine website.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Stratigraphic sequencing and mineralogy of various formations in the region of interest.

Era	Group	Sub-Group	Form.	Description [39]	Mineralogy
Pliocene	Siwalik Group	Middle Siwalik	Dhok Patan	Upper member: sandstone, light-gray; clay, light-reddish-brown and gray; conglomerate.	SiO₂: 55.2–68.35%, Al₂O₃: 12.54–14.59%, Fe₂O₃: 3.07–6.03%, MgO: 1.8–4.03%, CaO: 5.08–7.86, Na₂O: 2.35–2.61%, K₂O: 1.51–2.91%, MnO: 0.06–0.96%, TiO₂: 0.36–0.67%, P2O5: 0.078–0.159% [53].
			Dhok Patan	Lower member: sandstone, micaceous; conglomerate lenses and, basal cobble beds. Formation = 70% sandstone and 30% clay.
			Nagri	Sandstone, dark grey, micaceous, abundant in mafic minerals; conglomerate lenses. Clay, brownish-greyish-red, yellowish-brown and orange, silty, nodular. Formations = 50% sandstone; 50% shale. Field differentiation between Nagri and Dhok Patan is difficult.	Quartz = 43.9–63.4%, feldspar = 24.3–36.3%, and lithofragments = 11.7–25.6%. High in mafic silicates pyroxene, amphibole, olivine, and mica. Mica content in Nagri Formation ranges from 1–8% at Bahadar Khel anticline of the total frameworks [52].
		Lower Siwalik	Chinji	Claystone: pointed heaps, mafic contribution of 23 to 47% (mudstone) and 56 to 69% (sandstone).	Quartz = 44–59%, feldspar = 24–32%, and lithotypes = 12–32% at the Bahadar Khel anticline [52].
Miocene	Rawalpindi Group	Kamlial		Mostly sandstone with low shale. Greenish-gray to grayish-green, fine- to coarse-grained sandstone; conglomerate lenses; micaceous; abundant mafic minerals. Clay, brownish-grey, green, and brownish-red. Beds of silty clay, siltstone, and claystone.	Quartz = 50–60%, feldspar = 22–25%, and mica 3–15%; mostly biotite. Traces of several heavy minerals exist, including epidote, garnet, monazite, ilmenite, rutile, apatite, chromite, and fluorite [52].
Miocene	Rawalpindi Group	Murree		Sandstone, purple, dark-grayish-brown, greenish-gray, medium to coarse-grained, conglomeratic. Shale, purple and reddish-brown.	Quartz = 66–89%, carbonate = 1–25%, and clays = 1–21%. Sandstone is arenite because all sandstone samples contain less than 15% matrix. Quartz = 25–40%, rock fragments = 16–40%, and feldspar 4–11%. Matrix from 1–10% with iron [51].
Eocene	Chahrat Group	Kohat		Habib Rahi Limestone member: limestone; Sadkal member: shale, green, greenish-gray; Kaladhand member: limestone, thin-bedded; interbedded with shale; foraminifera common.	>95% as calcium carbonate [50].
	Chahrat Group	Mami Khel		Clay, brownish-red, silty; some beds of sandstone and conglomerate. Claystone + siltstone with no significant sandstone.	Quartz = 35%, feldspar = 3%, rock fragments = 20%. Heavy minerals include tourmaline, zircon, garnet, epidote, sphene and apatite. Hematite and calcite are the dominating cementing material with minor chlorite [49].
		Jatta Gypsum		Jatta Gypsum: gypsum, bedded to massive.	Gypsum

References

Gupta, R.P. Spectra of Minerals and Rocks. In Remote Sensing Geology; Springer: Berlin/Heidelberg, Germany, 2003; pp. 33–52. ISBN 978-3-662-05283-9. [Google Scholar]
van der Meer, F.D.; van der Werff, H.M.A.; van Ruitenbeek, F.J.A. Potential of ESA’s Sentinel-2 for geological applications. Remote Sens. Environ. 2014, 148, 124–133. [Google Scholar] [CrossRef]
van der Meer, F.D.; van der Werff, H.M.A.; van Ruitenbeek, F.J.A.; Hecker, C.A.; Bakker, W.H.; Noomen, M.F.; van der Meijde, M.; Carranza, E.J.M.; de Smeth, J.B.; Woldai, T.; et al. Multi- and hyperspectral geologic remote sensing: A review. Int. J. Appl. Earth Obs. Geoinf. 2012, 14, 112–128. [Google Scholar] [CrossRef]
Bachri, I.; Hakdaoui, M.; Raji, M.; Teodoro, A.C.; Benbouziane, A.; Cl, A. Machine Learning Algorithms for Automatic Lithological Mapping Using Remote Sensing Data: A Case Study from Souk Arbaa Sahel, Sidi Ifni Inlier, Western Anti-Atlas, Morocco. ISPRS Int. J. Geo-Inf. 2019, 8, 248. [Google Scholar] [CrossRef] [Green Version]
Hamimi, Z.; Hagag, W.; Kamh, S.; El-Araby, A.M.A. Application of remote-sensing techniques in geological and structural mapping of Atalla Shear Zone and Environs, Central Eastern Desert, Egypt. Arab. J. Geosci. 2020, 13, 414. [Google Scholar] [CrossRef]
Khan, M.F.A.; Muhammad, K.; Bashir, S.; Ud Din, S.; Hanif, M. Mapping Allochemical Limestone Formations in Hazara, Pakistan Using Google Cloud Architecture: Application of Machine-Learning Algorithms on Multispectral Data. ISPRS Int. J. Geo-Inf. 2021, 10, 58. [Google Scholar] [CrossRef]
Köhler, M.; Hanelli, D.; Schaefer, S.; Barth, A.; Knobloch, A.; Hielscher, P.; Cardoso-Fernandes, J.; Lima, A.; Teodoro, A.C. Lithium potential mapping using artificial neural networks: A case study from central portugal. Minerals 2021, 11, 1046. [Google Scholar] [CrossRef]
Merembayev, T.; Kurmangaliyev, D.; Bekbauov, B.; Amanbek, Y. A Comparison of Machine Learning Algorithms in Predicting Lithofacies: Case Studies from Norway and Kazakhstan. Energies 2021, 14, 1896. [Google Scholar] [CrossRef]
Öztan, N.S.; Süzen, M.L. Mapping evaporate minerals by ASTER. Int. J. Remote Sens. 2011, 32, 1651–1673. [Google Scholar] [CrossRef]
Sekandari, M.; Aminpour, S.M.; Masoumi, I.; Pour, A.B.; Muslim, A.M.; Rahmani, O.; Hashim, M.; Zoheir, B.; Pradhan, B.; Misra, A. Application of Landsat-8, Sentinel-2, ASTER and WorldView-3 Spectral Imagery for Exploration of Carbonate-Hosted Pb-Zn Deposits in the Central Iranian Terrane (CIT). Remote Sens. 2020, 12, 1239. [Google Scholar] [CrossRef] [Green Version]
Tangestani, M.H.; Shayeganpour, S. Mapping a lithologically complex terrain using Sentinel-2A data: A case study of Suriyan area, southwestern Iran. Int. J. Remote Sens. 2020, 41, 3558–3574. [Google Scholar] [CrossRef]
Salehi, S.; Mielke, C.; Brogaard Pedersen, C.; Dalsenni Olsen, S. Comparison of ASTER and sentinel-2 spaceborne datasets for geological mapping: A case study from North-East Greenland. Geol. Surv. Denmark Greenl. Bull. 2019, 43, 1–6. [Google Scholar] [CrossRef]
El Atillah, A.; El Morjani, Z.E.A.; Souhassou, M. Use of the Sentinel-2A Multispectral Image for Litho-Structural and Alteration Mapping in Al Glo’a Map Sheet (1/50,000) (Bou Azzer-El Graara Inlier, Central Anti-Atlas, Morocco). Artif. Satell. 2019, 54, 73–96. [Google Scholar] [CrossRef] [Green Version]
Tripathi, M.K. Lithological Mapping using Digital Image Processing Techniques on Landsat 8 OLI Remote Sensing Data in Jahajpur, Bhilwara, Rajasthan. In Proceedings of the 2nd International Conference on Intelligent Communication and Computational Techniques (ICCT), Jaipur, India, 28–29 September 2019; pp. 43–48. [Google Scholar]
Traore, M.; Çan, T.; Tekin, S. Discrimination of Iron Deposits Using Feature Oriented Principal Component Selection and Band Ratio Methods: Eastern Taurus /Turkey. Int. J. Environ. Geoinform. 2020, 7, 147–156. [Google Scholar] [CrossRef]
Rowan, L.C.; Schmidt, R.G.; Mars, J.C. Distribution of hydrothermally altered rocks in the Reko Diq, Pakistan mineralized area based on spectral analysis of ASTER data. Remote Sens. Environ. 2006, 104, 74–87. [Google Scholar] [CrossRef]
Rowan, L.C.; Mars, J.C. Lithologic mapping in the Mountain Pass, California area using Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data. Remote Sens. Environ. 2003, 84, 350–366. [Google Scholar] [CrossRef]
Crósta, A.P.; De Souza Filho, C.R.; Azevedo, F.; Brodie, C. Targeting key alteration minerals in epithermal deposits in Patagonia, Argentina, using ASTER imagery and principal component analysis. Int. J. Remote Sens. 2003, 24, 4233–4240. [Google Scholar] [CrossRef]
Khan, A.; Faisal, S.; Shafique, M.; Khan, S.; Bacha, A.S. ASTER-based remote sensing investigation of gypsum in the Kohat Plateau, north Pakistan. Carbonates Evaporites 2020, 35, 3. [Google Scholar] [CrossRef]
Rajendran, S.; Hersi, O.S.; Al-Harthy, A.; Al-Wardi, M.; El-Ghali, M.A.; Al-Abri, A.H. Capability of advanced spaceborne thermal emission and reflection radiometer (ASTER) on discrimination of carbonates and associated rocks and mineral identification of eastern mountain region (Saih Hatat window) of Sultanate of Oman. Carbonates Evaporites 2011, 26, 351–364. [Google Scholar] [CrossRef]
Nasir, S.; Rajendran, S. ASTER Spectral Sensitivity of carbonate rocks—Study in Sultanate of Oman. Adv. Sp. Res. 2014, 53, 656–673. [Google Scholar] [CrossRef]
Kabolizadeh, M.; Rangzan, K.; Mousavi, S.S.; Azhdari, E. Applying optimum fusion method to improve lithological mapping of sedimentary rocks using sentinel-2 and ASTER satellite images. Earth Sci. Inform. 2022, 15, 1765–1778. [Google Scholar] [CrossRef]
Pal, M.; Rasmussen, T.; Porwal, A. Optimized Lithological Mapping from Multispectral and Hyperspectral Remote Sensing Images Using Fused Multi-Classifiers. Remote Sens. 2020, 12, 177. [Google Scholar] [CrossRef] [Green Version]
Xia, G.S.; Wang, Z.; Xiong, C.; Zhang, L. Accurate annotation of remote sensing images via active spectral clustering with little expert knowledge. Remote Sens. 2015, 7, 15014–15045. [Google Scholar] [CrossRef] [Green Version]
Brown, W.M.; Gedeon, T.D.; Groves, D.I.; Barnes, R.G. Artificial neural networks: A new method for mineral prospectivity mapping. Aust. J. Earth Sci. 2000, 47, 757–770. [Google Scholar] [CrossRef]
Cracknell, M.J.; Reading, A.M. Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information. Comput. Geosci. 2014, 63, 22–33. [Google Scholar] [CrossRef] [Green Version]
Oh, H.; Lee, S. Application of Artificial Neural Network for Gold—Silver Deposits Potential Mapping: A Case Study of Korea. Nat. Resour. Res. 2010, 19, 103–124. [Google Scholar] [CrossRef]
Othman, A.A.; Gloaguen, R. Integration of spectral, spatial and morphometric data into lithological mapping: A comparison of different Machine Learning Algorithms in the Kurdistan Region, NE Iraq. J. Asian Earth Sci. 2017, 146, 90–102. [Google Scholar] [CrossRef]
Othman, A.A.; Gloaguen, R. Improving lithological mapping by SVM classification of spectral and morphological features: The discovery of a new chromite body in the Mawat ophiolite complex (Kurdistan, NE Iraq). Remote Sens. 2014, 6, 6867–6896. [Google Scholar] [CrossRef] [Green Version]
Shirmard, H.; Farahbakhsh, E.; Müller, R.D.; Chandra, R. A review of machine learning in processing remote sensing data for mineral exploration. Remote Sens. Environ. 2022, 268, 112750. [Google Scholar] [CrossRef]
Chollet, F. Deep Learning with Python, 1st ed.; Manning Publications Co.: Shelter Island, NY, USA, 2017; ISBN 1617294438. [Google Scholar]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2009; ISBN 9780387848587. [Google Scholar]
Morales, E.F.; Zaragoza, J.H. An introduction to reinforcement learning. In Decision Theory Models for Applications in Artificial Intelligence: Concepts and Solutions; IGI Global: Hershey, PA, USA, 2011; pp. 63–80. ISBN 9781609601652. [Google Scholar]
Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Quackenbush, L.; Adeli, S. Google Earth Engine for geo-big data applications: A meta-analysis and systematic review. ISPRS J. Photogramm. Remote Sens. 2020, 164, 152–170. [Google Scholar] [CrossRef]
Pelletier, C.; Valero, S.; Inglada, J.; Champion, N.; Sicre, C.M.; Dedieu, G. Effect of training class label noise on classification performances for land cover mapping with satellite image time series. Remote Sens. 2017, 9, 173. [Google Scholar] [CrossRef]
Hird, J.N.; DeLancey, E.R.; McDermid, G.J.; Kariyeva, J. Google earth engine, open-access satellite data, and machine learning in support of large-area probabilisticwetland mapping. Remote Sens. 2017, 9, 1315. [Google Scholar] [CrossRef] [Green Version]
Amani, M.; Ghorbanian, A.; Ahmadi, S.A.; Kakooei, M.; Moghimi, A.; Mirmazloumi, S.M.; Alizadeh Moghaddam, S.; Mahdavi, S.; Ghahremanloo, M.; Parsian, S.; et al. Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications: A Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5326–5350. [Google Scholar] [CrossRef]
Ahmad, S.; Ali, F.; Ahmad, I.; Hamidullah, S. Geological map of the Kohat Plateau, NW Himalaya, NWFP. Peshawar, Pakistan. Geol. Bull. Univ. Peshawar 2001, 34. [Google Scholar]
Meissner, C.R.; Master, J.M.; Rashid, M.A.; Hussain, M. Stratigraphy of the Kohat Quadrangle, Pakistan; USGS professional paper 716-D; U.S. Govt. Print. Off.: Washington, DC, USA, 1974; 29p. [CrossRef] [Green Version]
Bachri, I.; Hakdaoui, M.; Raji, M.; Benbouziane, A.; Mhamdi, H.S. Identification of Lithology Using Sentinel-2A Through an Ensemble of Machine Learning Algorithms. Int. J. Appl. Geospatial Res. 2022, 13, 1–17. [Google Scholar] [CrossRef]
Wang, Z.; Zuo, R.; Dong, Y. Mapping of himalaya leucogranites based on ASTER and sentinel-2A datasets using a hybrid method of metric learning and random forest. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1925–1936. [Google Scholar] [CrossRef]
Janati, M.H.; Soulaimani, A.; Admou, H.; Youbi, N.; Hafid, A.; Hefferan, K.P. Application of ASTER remote sensing data to geological mapping of basement domains in arid regions: A case study from the Central Anti-Atlas, Iguerda inlier, Morocco. Arab. J. Geosci. 2013, 7, 2407–2422. [Google Scholar] [CrossRef]
Fal, S.; Maanan, M.; Baidder, L.; Rhinane, H. The contribution of Sentinel-2 satellite images for geological mapping in the south of Tafilalet basin (Eastern Anti-Atlas, Morocco). In Proceedings of the 5th International Conference on Geoinformation Science—GeoAdvances, Casablanca, Morocco, 10–11 October 2018; The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences: Casablanca, Morrocco, 2019; Volume XLII-4/W12, pp. 75–82. [Google Scholar]
Abbasi, N.; Yasin, M. Petrography and Diagenetic History of Nagri Formation Sandstone in District Bagh and Muzaffarabad, Pakistan. Pakistan J. Geol. 2017, 1, 21–23. [Google Scholar] [CrossRef]
Pivnik, D.A.; Sercombe, W.J. Transpression- and compression-related, evaporite-controlled faulting and folding in the Kohat Plateau, NW Pakistan, in Himalayan tectonics: (Searle, M.P., and Treloar, P.J., eds.). Geol. Soc. London Spec. Publ. 1993, 74, 559–580. [Google Scholar] [CrossRef]
Pivnik, D.A.; Wells, N.A. The transition from Tethys to the Himalaya as recorded in northwest Pakistan. Geol. Soc. Am. Bull. 1996, 108, 1295–1313. [Google Scholar] [CrossRef]
Ullah, K. Lithofacies, Petrography and Geochemistry of the Neogene Molasse Sequence of Himalayan Foreland Basin, Southwestern Kohat, Pakistan. Ph.D. Thesis, National Centre of Excellence in Geology, University of Peshawar, Peshawar, Pakistan, 2009. [Google Scholar]
Hussain, H.; Zhang, S. Structural evolution of the Kohat fold and thrust belt in the Shakardarra area (South eastern Kohat, Pakistan). Geosciences 2018, 8, 311. [Google Scholar] [CrossRef]
Bilal, A.; Khan, M.S. Petrography and Provenance of Sandstone and Studies of Shale of Kuldana Formation, Kalamula and Khursheedabad Area, Kahuta, Azad Kashmir. Earth Sci. Malaysia 2017, 1, 21–31. [Google Scholar] [CrossRef]
Bilqees, R.; Shah, T. Industrial Applications of Limestone Deposits of Kohat, NWFP: A Research Towards the Sustainability of the Deposits. Pak. J. Sci. Ind. Res. 2007, 50, 293–298. [Google Scholar] [CrossRef]
Mughal, M.S.; Zhang, C.; Du, D.; Zhang, L.; Mustafa, S.; Hameed, F.; Khan, M.R.; Zaheer, M.; Blaise, D. Petrography and provenance of the Early Miocene Murree Formation, Himalayan Foreland Basin, Muzaffarabad, Pakistan. J. Asian Earth Sci. 2018, 162, 25–40. [Google Scholar] [CrossRef]
Ullah, K.; Arif, M.; Shah, M.T. Petrography of Sandstones from the Kamlial and Chinji Formations, Southwestern Kohat Plateau, NW Pakistan: Implications for Source Lithology and Paleoclimate. J. Himal. Earth Sci. 2006, 39, 1–13. [Google Scholar]
Ali, A.; Nabi, A.; Zhong, F.; Pan, J.; Yan, J. Genesis of sandstone type uranium deposit in Dhok Pathan Formation, Siwaliks Group of Trans-Indus Salt Range (Surghar range), Pakistan. In Proceedings of the International Symposium on Uranium Raw Material for the Nuclear Fuel Cycle: Exploration, Mining, Production, Supply and Demand, Economics and Environmental Issues (URAM-2018), Vienna, Austria, 25–29 June 2018; pp. 24–27. [Google Scholar]
der Werff, H.; der Meer, F. Sentinel-2A MSI and Landsat 8 OLI Provide Data Continuity for Geological Remote Sensing. Remote Sens. 2016, 8, 883. [Google Scholar] [CrossRef] [Green Version]
Bannari, A.; El-Battay, A.; Bannari, R.; Rhinane, H. Sentinel-MSI VNIR and SWIR bands sensitivity analysis for soil salinity discrimination in an arid landscape. Remote Sens. 2018, 10, 855. [Google Scholar] [CrossRef] [Green Version]
Din, S.U.; Muhammad, K.; Khan, M.F.A.; Bashir, S.; Sajid, M.; Khan, A. A fusion of feature-oriented principal components of multispectral data to map granite exposures of Pakistan. Appl. Sci. 2021, 11, 11486. [Google Scholar] [CrossRef]
Vapnik, V. The support vector method of function estimation. In Nonlinear Modeling; Suykens, J.A.K., Vandewalle, J., Eds.; Springer: Boston, MA, USA, 1998; pp. 55–85. [Google Scholar]
Hsu, C.W.; Chang, C.C.; Lin, C.J. A practical Guide to Support Vector Classification; Department of Computer Science National Taiwan University Taipei 106: Taiwan, China, 2010; pp. 1–16. [Google Scholar]
Karatzoglou, A.; Meyer, D.; Hornik, K. Support Vector Algorithm in R. J. Stat. Softw. 2006, 15, 1–28. [Google Scholar] [CrossRef] [Green Version]
Damaševičius, R. Structural analysis of regulatory DNA sequences using grammar inference and Support Vector Machine. Neurocomputing 2010, 73, 633–638. [Google Scholar] [CrossRef]
Haykin, S. Neural Network: A Comprehensive Foundation, 2nd ed.; Prentice Hall: Delhi, India, 2001; ISBN 81-7808-300-0. [Google Scholar]
Gajawada, S.K. The Math behind Artificial Neural Networks. Available online: https://towardsdatascience.com/the-heart-of-artificial-neural-networks-26627e8c03ba (accessed on 20 February 2022).
Kotsiantis, S.B. Supervised Machine Learning: A Review of Classification Techniques. Informatica 2007, 31, 249–268. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; ISBN 9780262035613. [Google Scholar]
Bindal, A. Normalization Techniques in Deep Neural Networks. Available online: https://medium.com/techspace-usict/normalization-techniques-in-deep-neural-networks-9121bf100d8 (accessed on 21 February 2022).
Jain, V. Everything You Need to Know about “Activation Functions” in Deep Learning Models. Available online: https://towardsdatascience.com/everything-you-need-to-know-about-activation-functions-in-deep-learning-models-84ba9f82c253%0A (accessed on 22 February 2022).
Nie, F.; Hu, Z.; Li, X. An investigation for loss functions widely used in machine learning. Commun. Inf. Syst. 2018, 18, 37–52. [Google Scholar] [CrossRef]
Story, M.; Congalton, R.G. Accuracy Assessment: A User’s Perspective. Photogramm. Eng. Remote Sens. 1986, 52, 397–399. [Google Scholar]
Brown, D.G.; Lusch, D.P.; Duda, K.A. Supervised classification of types of glaciated landscapes using digital elevation data. Geomorphology 1998, 21, 233–250. [Google Scholar] [CrossRef]
Bentahar, I.; Raji, M. Comparison of Landsat OLI, ASTER, and Sentinel 2A data in lithological mapping: A Case study of Rich area (Central High Atlas, Morocco). Adv. Sp. Res. 2021, 67, 945–963. [Google Scholar] [CrossRef]
Lin, J.; Wang, R.; Zhao, B.; Cheng, S. A comprehensive scheme for lithological mapping using Sentinel-2A and ASTER GDEM in weathered and vegetated coastal zone, Southern China. Open Geosci. 2019, 11, 982–996. [Google Scholar] [CrossRef]
Ahmed, W.; Muhammad, K.; Glass, H.J.; Chatterjee, S.; Khan, A.; Hussain, A. Novel MLR-RF-Based Geospatial Techniques: A Comparison with OK. ISPRS Int. J. Geo-Inf. 2022, 11, 371. [Google Scholar] [CrossRef]
Hushko, S.; Botelho, J.M.; Maksymova, I.; Slusarenko, K.; Kulishov, V. Sustainable development of global mineral resources market in Industry 4.0 context. In Proceedings of the IOP Conference Series: Earth and Environmental Science, 8th International Scientific Conference on Sustainability in Energy and Environmental Science, Ivano-Frankivsk, Ukraine, 21–22 October 2020; Volume 628, p. 012025. [Google Scholar] [CrossRef]
Aziz, A.; Schelén, O.; Bodin, U. A Study on Industrial IoT for the Mining Industry: Synthesized Architecture and Open Research Directions. IoT 2020, 1, 529–550. [Google Scholar] [CrossRef]
Liu, Y.; Dhakal, S. Internet of Things technology in mineral remote sensing monitoring. Int. J. Circuit Theory Appl. 2020, 48, 2065–2077. [Google Scholar] [CrossRef]
Molaei, F.; Rahimi, E.; Siavoshi, H.; Afrouz, S.G.; Tenorio, V. A Comprehensive Review on Internet of Things (IoT) and its Implications in the Mining Industry. Am. J. Eng. Appl. Sci. 2020, 13, 499–515. [Google Scholar] [CrossRef]
Jimenez, J.R.C.; Zhao, P.; Mansourian, A.; Brovelli, M.A. Geospatial Blockchain: Review of decentralized geospatial data sharing systems. Agil. GIScience Ser. 2022, 3, 1–6. [Google Scholar] [CrossRef]

Figure 1. Location of the study area in the Kohat–Karak districts of the Khyber Pakhtunkhwa province of Pakistan (blue), the extent defined by corner points (1) 33 30′9.83″ N, 70 45′2.05″ E; (2) 33 30′9.83″ N, 71 54′14.87″ E; (3) 32 55′37.63″ N, 71 54′14.87″ E; and (4) 32 55′37.63″ N, 70 45′2.05″ E. The extent of a published geological map of the area (Figure S1 [38]) is in black.

Figure 2. The architecture of an artificial neural network.

Figure 3. Block diagram of lithological classification applied in the study.

Figure 4. Average surface reflectance based on training samples selected from Sentinel-2 data for ten lithological units.

Figure 5. NDVI (top) of the study area after preprocessing; sample polygons (bottom) collected and annotated by information from published geological map (Figure S1), PCA and MNF results (Figure 4) of the study area defined by (1) 33 30′9.83″ N, 70 45′2.05″ E; (2) 33 30′9.83″ N, 71 54′14.87″ E; (3) 32 55′37.63″ N, 71 54′14.87″ E; and (4) 32 55′37.63″ N, 70 45′2.05″ E.

Figure 6. MNF (top) and false-color composite (bottom) of PC components (PC-3, PC-4, and PC-5) of the study area defined by corner points (1) 33 30′9.83″ N, 70 45′2.05″ E; (2) 33 30′9.83″ N, 71 54′14.87″ E; (3) 32 55′37.63″ N, 71 54′14.87″ E; and (4) 32 55′37.63″ N, 70 45′2.05″ E.

Figure 7. Final classified map using ANN (top) and SVM (bottom) bounded by the point corners (1) 33 30′9.83″ N, 70 45′2.05″ E; (2) 33 30′9.83″ N, 71 54′14.87″ E; (3) 32 55′37.63″ N, 71 54′14.87″ E; and (4) 32 55′37.63″ N, 70 45′2.05″ E.

Figure 8. Field photographs of the lithostratigraphic units in the mapped area. (a)—Looking north at contacts among the Kohat, Murree, and Kamlial formations. (b)—Showing evaporites (Jatta Gypsum and Bahadurkheil Salt). (c)—Looking northeast at contacts among the Jatta Gypsum, Chinji, and Nargi formations. (d)—Looking north at the contact between the Chinji and Nagri formations (man’s height = 6 feet).

Table 1. The training and testing samples for different classes.

Lithological Units	Time Scale	Training Samples	Testing Samples
Alluvium (A)	Miocene	2561	1100
Dhok Patan (D)	Miocene	1430	625
Nagri (N)	Miocene	1701	725
Chinji(C)	Miocene	1408	589
Kamlial Sandstone (KS)	Miocene	1993	833
Murree (M)	Miocene	548	261
Kohat (K)	Eocene	3042	1297
Mami Khel Clay (MK)	Eocene	564	249
Jatta Gypsum (G)	Eocene	2745	1100
Water (W)	NA	1073	496

Table 2. Important hyperparameters of SVM and ANN used in the study.

SVM		ANN
Kernel type	1st-degree polynomial	Number of hidden layers	4
Gamma (g)	1/6	Activation function	ReLU and softmax
Cost (C)	0.02	Loss function	Categorical cross-entropy
		Optimizer	Adam with a learning rate of 0.0001

Table 3. Training, validation accuracies, and kappa coefficient of SVM and ANN.

Algorithm	Training Accuracy	Testing Accuracy	Kappa Coefficient
SVM	95.98	95.61	0.95
ANN	94.48	95.73	0.95

Table 4. Confusion matrix of artificial neural network (ANN) (with Adam optimizer).

Formations	G	DP	C	N	K	MK	M	KS	A	W	Producer Accuracy (%)
Jatta Gypsum (G)	1094	0	0	0	4	0	0	2	0	0	99.5
Dhok Patan (DP)	2	600	0	2	11	0	0	7	3	0	96.0
Chinji (C)	2	3	547	15	15	0	0	1	5	1	92.9
Nagri (N)	0	1	2	643	2	1	10	65	1	0	88.7
Kohat (K)	1	2	5	0	1285	0	1	1	2	0	99.1
Mami Khel (MK)	0	0	0	0	4	237	7	1	0	0	95.2
Murree (M)	0	0	2	0	10	25	202	20	2	0	77.4
Kamlial (KS)	1	7	1	10	3	6	1	796	8	0	95.6
Alluvium (A)	0	18	1	0	1	0	1	5	1074	0	97.6
Water (W)	0	1	6	0	3	0	0	0	0	486	98.0
User Accuracy (%)	99.5	94.9	97.0	96.0	96.0	88.1	91.0	88.6	98.1	99.8

Table 5. Confusion matrix of support vector machine (SVM).

Formations	G	DP	C	N	K	MK	M	KS	A	W	Producer Accuracy (%)
Gypsum (G)	1128	1	0	0	0	0	0	8	0	0	99.2
Dhok Patan (DP)	0	578	3	8	6	0	0	3	27	0	92.5
Chinji (C)	0	0	584	6	6	0	0	0	12	0	96.1
Nagri (N)	3	4	7	691	1	0	1	32	1	0	93.4
Kohat (K)	0	3	7	0	1285	4	3	0	2	0	98.5
Mami Khel (MK)	0	0	0	0	3	225	16	0	0	0	92.2
Murree (M)	0	0	1	0	18	14	186	8	0	0	81.9
Kamlial (KS)	1	2	0	21	2	2	6	777	6	0	95.1
Alluvium (A)	0	21	15	2	2	0	7	2	1053	0	95.6
Water (W)	0	4	2	0	1	0	0	0	1	413	98.1
User Accuracy (%)	99.6	94.3	94.3	94.9	97.1	91.8	84.9	93.6	95.6	100.0

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Elahi, F.; Muhammad, K.; Din, S.U.; Khan, M.F.A.; Bashir, S.; Hanif, M. Lithological Mapping of Kohat Basin in Pakistan Using Multispectral Remote Sensing Data: A Comparison of Support Vector Machine (SVM) and Artificial Neural Network (ANN). Appl. Sci. 2022, 12, 12147. https://doi.org/10.3390/app122312147

AMA Style

Elahi F, Muhammad K, Din SU, Khan MFA, Bashir S, Hanif M. Lithological Mapping of Kohat Basin in Pakistan Using Multispectral Remote Sensing Data: A Comparison of Support Vector Machine (SVM) and Artificial Neural Network (ANN). Applied Sciences. 2022; 12(23):12147. https://doi.org/10.3390/app122312147

Chicago/Turabian Style

Elahi, Fakhar, Khan Muhammad, Shahab Ud Din, Muhammad Fawad Akbar Khan, Shahid Bashir, and Muhammad Hanif. 2022. "Lithological Mapping of Kohat Basin in Pakistan Using Multispectral Remote Sensing Data: A Comparison of Support Vector Machine (SVM) and Artificial Neural Network (ANN)" Applied Sciences 12, no. 23: 12147. https://doi.org/10.3390/app122312147

APA Style

Elahi, F., Muhammad, K., Din, S. U., Khan, M. F. A., Bashir, S., & Hanif, M. (2022). Lithological Mapping of Kohat Basin in Pakistan Using Multispectral Remote Sensing Data: A Comparison of Support Vector Machine (SVM) and Artificial Neural Network (ANN). Applied Sciences, 12(23), 12147. https://doi.org/10.3390/app122312147

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lithological Mapping of Kohat Basin in Pakistan Using Multispectral Remote Sensing Data: A Comparison of Support Vector Machine (SVM) and Artificial Neural Network (ANN)

Abstract

1. Introduction

2. Materials and Methods

2.1. Geology of the Study Area

2.2. Multispectral Data and Google Earth Engine

2.3. Supervised Classification Algorithms

2.3.1. Support Vector Machine (SVM)

2.3.2. Artificial Neural Network (ANN)

2.3.3. Accuracy Measures

3. Mapping Lithologies in the Kohat Plateau Using SVM and ANN

3.1. Spectral Features of Lithologies in the Region

3.2. Preprocessing of Data

4. Results

5. Discussion

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI