1. Introduction
Thematic classification methods are an important tool used in various application fields that make use of spatial analysis processes in Geographical Information Systems (for short, GIS) such as: environmental sciences, urban planning, climate risk analysis, geomorphological studies of territory, etc.
Thematic maps are built in GIS environments by adopting thematic classification techniques to partition elements of a spatial dataset (the theme). These techniques perform a partitioning into equivalence classes of the domain of one or more features of a spatial dataset using specific rules for class construction. Each class, called a thematic class, is assigned a linguistic label and a symbol with which the spatial elements belonging to it will be displayed on the map.
A well-known thematic classification method partitions the theme into a number of classes equal to the number of unique values of a set of one or more features; this method, called Unique Value, is applicable only when the number of unique values of the features considered is much less than the number of entities of the theme to be represented on the map. When this does not happen, for example when the domain of a feature is a real interval, it is necessary to resort to other classification methods that segment this numerical interval into contiguous and disjoint sub-intervals. Each sub-interval, characterized by the values of its bounds, called breaks, is associated with a thematic class to which the entities whose value of the considered feature belongs to the sub-interval [
1].
One of the best-known thematic classification methods is the Natural Breaks method, introduced by Jenks [
2]. Natural Breaks is an optimization method that determines the values of the breaks of each thematic class in order to minimize the sum of the standard deviations of the values of the features of the elements belonging to each class [
2]. This method is called natural as it provides partitioning of the theme in such a way that, with respect to the selected feature, all the elements of a class are similar among them and dissimilar to those of other classes.
The main flaw of the canonical thematic classification methods is the impossibility of attributing uncertainty to the classification. In order to manage this uncertainty, it is necessary to use a method based on the fuzzy partitioning of the domain of the selected feature, assigning a membership degree of an element to each thematic class. This approach has the advantage of modeling the expert’s reasoning more accurately and of providing measures for assessing the reliability of the resulting thematic map.
In [
3], a fuzzy multilevel comprehensive model is applied to assess the reliability of thematic maps.
A fuzzy classification method applied to spatial image data is proposed in [
4], where the Morisita index [
5] is used to assess the reliability of the thematic map. This index measures the similarity between two sets of data; it is used in [
4] to assess the accuracy of the thematic map obtained using a fuzzy partition of the feature. This index is computationally complex to calculate and is difficult to interpret as a measure of the reliability of the resulting thematic map.
In [
6], a logistic function proposed in [
7] is used to fuzzify the domain of a numerical feature and to create a fuzzy classification of this feature. The critical point of this approach is that the resulting thematic classes can be difficult to interpret by the user.
A fuzzy thematic classification method based on fuzzy partitioning of a feature is proposed in [
8]. The study area is partitioned into subzones and in each subzone is applied a method based on the calculus of the term frequency–inverse document frequency index (TF–IDF) to measure the relevance of a type of problem in a sub-area, extracted from the reports made by citizens over a period of time. The authors create a Ruspini fuzzy partition [
9] of the measured relevance to construct thematic maps of the relevance of the type of problem. This approach has the advantage of producing thematic classifications connected to the user’s reasoning, creating thematic maps that can be easily interpreted by the user.
The main shortcoming of the fuzzy-based thematic classification methods proposed in the literature is their difficult interpretability by the user; in fact, these methods do not take into account the user’s approximate reasoning in the construction of thematic classes. Another critical point of these approaches is their inadequacy in assessing the reliability of the resulting thematic map.
In this research, we propose a fuzzy classification method based on the fuzzy partitioning of the selected feature, measuring the reliability of the resulting thematic map using the Fuzzy Entropy technique introduced by De Luca and Termini [
10,
11].
Fuzzy Entropy was used in [
12] to measure the reliability of clusters detected in hotspot analysis; the Extended Fuzzy C-Means algorithm [
13], an extended version of the Fuzzy C-Means algorithm [
14,
15,
16], is applied to detect hotspots on the map; Fuzzy Entropy measures the fuzziness of the final fuzzy clusters.
In [
17] is proposed a new validity index based on Fuzzy Entropy to find the best number of clusters in Fuzzy C-Means. In [
18], Fuzzy Entropy is applied to find the optimal values of the cluster centers in Fuzzy C-Means.
We use Fuzzy Entropy to measure the fuzziness of the fuzzy sets composing the fuzzy partition of the domain of the selected feature.
The goal of our method is to determine the most suitable fuzzy partition to build the thematic map of the selected feature by measuring the fuzzy entropy of the fuzzy sets that make it up. Our method iteratively builds finer fuzzy partitions starting from an initial fuzzy partition. At each iteration, the fuzzy sets with the highest fuzzy entropy are split. The process ends when the fuzzy entropy of the fuzzy partition is less than a predetermined threshold, where the fuzzy entropy of the fuzzy partition is given by the average of the fuzzy entropy of its fuzzy sets.
The final fuzzy partition will be used to create the thematic map showing the spatial distribution of the feature selected in the study area; to this thematic will be assigned a reliability according to the fuzzy entropy of the fuzzy partition used.
We call our method Thematic Fuzzy Entropy Partition (for short, TFEP).
Initially, the user, based on their experience, creates a Ruspini fuzzy partition, setting its cardinality and using triangular fuzzy numbers to create the fuzzy sets.
Then, the fuzzy entropies of each fuzzy set are measured and the fuzzy entropy of the fuzzy partition is computed. If it is less than or equal to a prefixed threshold, the algorithm stops, and the thematic map is constructed assigning to it a reliability measure, a function of the fuzzy entropy of the fuzzy partition. Otherwise, the fuzzy partition is reconstructed, splitting the fuzzy set having the highest fuzzy entropy to form two fuzzy sets; then, the fuzzy entropies of all the fuzzy sets are recalculated; this process is iterated until the fuzzy entropies of the fuzzy relation are less than or equal to the threshold.
These steps are schematized in
Figure 1.
The main benefits of TFEP over the thematic classification methods well-known in the literature are summarized below:
- -
TFEP is applicable to any type of spatial data and builds the resulting thematic map from an initial fuzzy partition of the selected feature created by the user;
- -
The reliability of the resulting thematic map is assessed by using the fuzzy entropy. The reliability estimate allows the user to evaluate to what extent the thematic classification of the analyzed feature provides a spatial distribution of the entities included in the study area on the map adhering to the values assumed by the feature;
- -
The use of an iterative process of measurement of the fuzzy entropy of the fuzzy partition allows generation of a fuzzy partition whose fuzzy entropy is less than or equal to a specified threshold, in order to build a thematic map with an acceptable level of reliability.
Section 2 briefly describes the definition of fuzzy partition according to Ruspini and the measure of fuzzy entropy by De Luca and Termini. The TFEP method is presented in
Section 3;
Section 4 shows the results of its application in an area of study given by the municipalities of the province of Florence, in Italy. Final considerations and future perspectives are included in
Section 5.
3. The Thematic Fuzzy Entropy Partition Method
We propose an iterative fuzzy thematic classification method to find the best fuzzy partition of the domain of the selected feature; we use the fuzzy entropy measure to assess the fuzziness of a fuzzy partition and to evaluate the reliability of the final thematic map.
Let E = {} be a theme formed by N georeferenced entities . The aim of the TFEP method is to find the best thematic classification of the entities of E based on a selected feature.
Let X = {} be the numerical discrete set of the feature values, where is the feature value taken by the entity . X is a subset of a real close interval U.
Let F = {A1, A2, …AM} be a fuzzy partition of U, where Ak: U→ [0, 1],k =1,2,…M. To the set Ak is assigned by the user a label Ck.
F is the initial fuzzy partition created by the user to construct a thematic classification of the entities of the theme E using triangular fuzzy sets. The choice of triangular fuzzy sets is motivated by their ease of representation and modeling as fuzzy numbers expressed by triplets of numbers.
TFEP measures the fuzziness of the fuzzy sets A
1, A
2,…,A
M using (9) and computes the fuzziness of the fuzzy partition
F given by the average of the fuzziness of M fuzzy sets:
If is greater than a fuzziness threshold HTh, the fuzzy set having the highest fuzziness is split into two fuzzy sets and a new finer fuzzy partition formed by M + 1 fuzzy sets is created. If Ak is the fuzzy set with the highest fuzziness, it is split into two fuzzy sets to which the following labels are assigned, respectively: almost Ck and more than Ck.
The splitting of the fuzzy set A consisting of a fuzzy number (a, b, c) in two fuzzy sets is achieved by constructing two fuzzy numbers (a
1, b
1, c
1) and (a
2, b
2, c
2), where:
Moreover, in order to respect the Ruspini condition:
- -
If the previous fuzzy set exists and it consists of the fuzzy number (aPrev, bPrev, cPrev), the value of cPrev, previously set equal to b, is changed to b1;
- -
If the next fuzzy set exists and it consists of the fuzzy number (aNext, bNext, cNext), the value of aNext, previously set equal to b, is changed to b2.
The process is iterated until the fuzziness of the current fuzzy partition is less than or equal to HTh.
Finally, the thematic map of E is built using the fuzzy partition found. An entity is assigned to the class corresponding to the fuzzy set to which it belongs with the highest degree.
The user can change the labels of the fuzzy sets in the final fuzzy partition if the user intends to combine them into semantically more meaningful names.
To the thematic map is assigned a reliability given by the formula:
The TFEP algorithm is schematized below in Algorithm 1.
Algorithm 1: Thematic Fuzzy Entropy Partition (TFEP) |
- 1.
Initialize the fuzzy partition F given by M fuzzy sets. - 2.
Set the fuzziness threshold HTh - 3.
H ← 1 //variable containing the fuzziness of the fuzzy partition - 4.
While H > HTh - 5.
For k = 1 to M //for each fuzzy set of F - 6.
Compute the fuzziness of Ak H(Ak) by (5) - 7.
Next k - 8.
Compute H(F) by (7) - 9.
H ← H(F) - 10.
If H > HTh - 11.
Split the fuzzy sets with highest fuzziness in two fuzzy sets by (8) - 12.
Assign the labels of the two fuzzy sets using the prefixes almost and more than - 13.
M ← M + 1 - 14.
Construct the new fuzzy partition F having M fuzzy sets - 15.
End If - 16.
End While - 17.
Create the thematic map using the fuzzy partition F - 18.
Compute the reliability of the thematic map by (9)
|
We now show an example of application of the TFEP method, considering the discrete set X = {
} shown in the example in
Table 1.
Initially we construct a fuzzy partition with the four fuzzy sets in
Figure 4. The four fuzzy sets A
1, A
2, A
3, and A
4 are labeled, respectively, Low, Medium low, Medium high, and High. A
1 is an R-function, A
2 and A
3 are triangular fuzzy sets, and A
4 is an L-function. Their membership degree functions are given by:
We set the fuzziness threshold HTh to 0.60, considering that the average degree of belonging of an element to a fuzzy set with a fuzziness equal to 0.6 is equal to 0.15 or 0.85.
In
Table 2 are shown the values of the fuzziness calculated for the four fuzzy sets. The fuzzy set having the highest fuzziness is the fuzzy set A
2; its fuzziness is 0.66, greater than the threshold.
In the next step, the fuzzy set A
2; is split into two fuzzy sets by (8).
Table 3 show the fuzziness of all fuzzy sets. Now, all fuzzy sets have a fuzziness less than the threshold and the process stops.
Figure 5 shows the final fuzzy partition for this example. The user, assigning the labels to the five thematic classes, can decide to use the label of the corresponding fuzzy set or to use other labels that the user deems semantically more suitable.
4. Experimental Results
We implemented the TFEP algorithm in the Tool GIS ArcGIS Desktop 10.8, using the ArcPy Python libraries.
We test TFEP, executing it on over 100 vector and image spatial datasets with different cardinalities, setting the fuzzy entropy threshold HTh to 0.60. For brevity, we show the results obtained for a vector and an image spatial dataset.
In this first test, we execute TFEP to extract a thematic map showing the spatial distribution of the number of inhabitants by residential building in an urban study area given by the 44 municipalities of the province of Florence (Italy). The source spatial dataset is a vector dataset consisting of census data provided by the Italian National Institute of Statistics (ISTAT) (
https://www.istat.it/it/archivio/104317 (accessed on 1 July 2022).
We asked an expert to create the initial fuzzy partition on the domain given by the numerical interval in which the number of inhabitants of a residential building varies; this fuzzy partition, composed by three triangular fuzzy numbers, is shown in
Figure 7.
Each fuzzy set is expressed by a triplet (a, b, c); in
Table 4 are shown the label of each fuzzy set, the type of triangular fuzzy number used, the values of the parameters in the triplet, and the measured fuzziness.
The fuzziness
H(
F) of this fuzzy partition, computed by (7) is 0.41; then, the reliability of the thematic map in
Figure 8 is 0.59.
Since the fuzziness of the fuzzy set Medium is higher than the threshold, TFEP splits this fuzzy set into two fuzzy sets, called, respectively, Almost Medium and More than medium. In
Figure 9 is shown the new fuzzy partition having four fuzzy sets.
In
Table 5 are shown the label of each fuzzy set, the type of triangular fuzzy number used, the values of the parameters in the triplet, and the measured fuzziness.
Now, the fuzziness of all fuzzy sets is below the threshold
HTh. TFEP stops, generating the final thematic map shown in
Figure 10.
The fuzziness H(
F) of the final fuzzy partition, computed by (7), is 0.32 and the reliability of the thematic map in
Figure 10 is 0.68.
Now, we show the results of another test executed to obtain a thematic map of a spatial image dataset. It is given by a 1 m × 1 m remote sensing image dataset of the Normalized Difference Vegetation Index (for short, NDVI) obtained in June 2022 by the Sentinel 2 satellite on the study area of the municipality of Naples (Italy).
The NDVI is obtained by a combination of the satellite Near-Infrared (NIR) and Red (R) bands, using the formula:
This allows evaluation of some characteristics of the vegetation. It varies between −1 and 1; values between −1 and 0 are typical of uncultivated and anthropogenic areas. Values between 0 and 1 correspond to cultivated or wooded areas; the closer these values are to 1, the greater the soil vegetation cover and the evapotranspiration capacity of the vegetation.
In
Figure 11 is shown the NDVI image dataset on the study area.
The expert sets the initial fuzzy partition on the domain [−1, 1], given by four triangular fuzzy sets as in
Figure 12.
In
Table 6 are shown the label of each fuzzy set, the type of triangular fuzzy number used, the values of the parameters a, b, and c in the triplet, and the measured fuzziness. In bold is shown the highest fuzziness.
The fuzziness
H(
F) of this fuzzy partition, computed by (7) is 0.47; then, the reliability of the thematic map in
Figure 13 is 0.53.
Since the fuzziness of the fuzzy set Medium is higher than the threshold, TFEP splits this fuzzy set into two fuzzy sets, called, respectively, Almost Medium and More than medium. In
Figure 14 is shown the new fuzzy partition having five fuzzy sets.
In
Table 7 are shown the label of each fuzzy set, the type of triangular fuzzy number used, the values of the parameters in the triplet, and the measured fuzziness. In bold is shown the highest fuzziness.
The fuzziness
H(
F) of this fuzzy partition, computed by (7) is 0.46; then, the reliability of the thematic map in
Figure 15 is 0.54.
Since the fuzziness of the fuzzy set More than medium is higher than the threshold, TFEP splits this fuzzy set into two fuzzy sets, called, respectively, Almost more than medium and More more than medium. In
Figure 16 is shown the new fuzzy partition having six fuzzy sets.
In
Table 8 are shown the label of each fuzzy set, the type of triangular fuzzy number used, the values of the parameters in the triplet, and the measured fuzziness.
The fuzziness of all fuzzy sets is below the threshold
HTh. TFEP stops. The expert decided to rename the labels of the six fuzzy sets in order to semantically represent the final thematic map more clearly. The new labels assigned by the expert to the final fuzzy sets are shown in
Table 9.
The final thematic map is shown in
Figure 17.
The fuzziness H(F) of the fuzzy partition, computed by (7) is 0.42; then, the reliability of the final thematic map is 0.58.
To assess the benefits of our fuzzy thematic classification method in terms of increase in the map reliability, we measure the difference between the reliability of the final thematic map and the reliability of the initial thematic map, calling this difference Map-reliability gain.
In
Figure 18 is shown the trend of the map-reliability gain with respect to the map reliability of the initial thematic map, obtained by measuring the map-reliability gain in all tests.
This trend shows that TFEP always increases the reliability of the initial thematic map, with an average gain between 0.04 and 0.12.
In order to compare the performances of TFEP with respect to those of other fuzzy thematic classification methods, we measured, for each of the spatial datasets used in the tests, the reliability of the final thematic classification obtained using the methods proposed in [
4,
6,
8].
Table 10 shows the average, standard deviation, minimum, and maximum values of the map-reliability gain with respect to the map reliability obtained with each of the other fuzzy thematic classification methods. The reliability gain is always positive and, on average, varies between 0.08 and 0.10.
These results show that TFEP produces thematic maps whose reliability is higher than that of thematic maps generated using other fuzzy thematic classification methods.
In a nutshell, the results of our tests show that, regardless of the type and cardinality of the theme, TFEP allows you to build reliable thematic maps, building fuzzy partitions whose fuzzy sets have a fuzziness not exceeding a predetermined threshold. In fact, the static assignment of the fuzzy partition can generate unreliable thematic maps due to the high fuzziness of fuzzy sets. Assigning the spatial element to the fuzzy set with the highest membership degree results in classification uncertainty, as membership degrees to other fuzzy sets are not considered; when the fuzziness of a fuzzy set is high, this uncertainty becomes non-negligible. TFEP’s strategy in these cases is to build a finer fuzzy partition by splitting the fuzzy set with the highest fuzziness.