1. Introduction
The role of geodata technologies (i.e., technologies used to collect, store, and analyze geodata) in humanitarian action is arguably indispensable; it is instrumental in determining when, where, and who needs aid before, during, and after a disaster (natural or man-made). Therefore, there has been a quickly evolving adaptation of geodata technologies by incorporating a variety of new geodata types, finding new uses for existing types, and new analytical methodologies, e.g., the use of Artificial Intelligence (AI).
However, despite the advantages of using geodata technologies in humanitarianism (i.e., fast and efficient aid distribution), several ethical challenges arise, including privacy. Privacy becomes a particularly pressing issue since the data subjects are often among the most vulnerable in society [
1,
2,
3]. These subjects are vulnerable not only because of disasters, but also because of existing and persistent socio-economic inequalities, injustices, and power imbalances. Moreover, privacy violations in the digital humanitarian space can be argued to challenge the humanity principle, e.g., refs. [
4,
5] (humanity principle advocates for the unconditional safeguarding of every individual’s life and dignity [
4]) since privacy preserves human dignity [
6].
However, privacy continues to be a “contested concept” [
7] with diverging legal, technological, and cultural dimensions [
8]. Of all the conceptualizations of privacy, we focus on informational privacy, which is strongly associated with (geo)data technologies [
9]. Furthermore, viewing informational privacy through the lens of informational harms—as has been done for personal information by Van den Hoven [
10]—allows for a better understanding of the impacts of privacy violations. However, rather than focus on individual privacy and harms, we primarily consider group privacy, a debate that has only relatively recently arisen [
11], and how these group-related harms materialize from the use of geodata technologies. Informational harms not only undermine individuals’ control of knowledge about them but affect groups as well [
9].
Previously, the trend was to focus more on the individual (i.e., through personal data protection) and the informational privacy threats accompanying geodata collected on individuals [
12,
13]. Listing four challenges in informational privacy, Floridi [
14], while giving “organizations” and “artificial constructs [of groups]” as examples, highlights the inadequacy of the individual privacy concept as it does not cater to groups. Unlike personal data and the problems of informed consent and re-identification in individual privacy, threats to group privacy are not as straightforward [
15]. First, there is the challenge of defining groups and this has implications on the kind of group privacy in question [
16]. Grouping based on defining the properties of groups first tends towards “its” privacy, while grouping first and then defining objectives favors “their” privacy (the sum of individual privacy) [
16]. Despite the late start of debates on group privacy, concerns about group privacy are growing and compounded by the fact that aggregation, clustering, or grouping individuals based on some characteristics is often the objective of data analytics [
15,
16] and the methods used to create such groupings or recommend decisions for groups tend to be opaque and prone to biases. Therefore, there has been a call to also consider the group and informational privacy threats arising from the collection, aggregation, classification, processing, and sharing of group-level data [
16]. Other conceptual research on group privacy includes Floridi’s [
17] argument that groups have a right to privacy since they are the central focus of informational technologies (which affects their values and rights) and Loi’s [
18] description of group privacy as either protection of confidential information or limiting inferences that can be made on a group based on individuals’ features or characteristics (i.e., inferential privacy). From a technical perspective, Majeed et al. [
19] list 25 threats to group privacy in the age of big data and AI, further demonstrating why group privacy is a problem worth studying. Recently, group privacy has attracted attention in civil societies (incl. humanitarianism’s use of (geo)data technologies), for example by Raymond [
20], who tackles group privacy by first expounding on the groups of interest—through the concept of demographically identifiable information (DII)—and why it is problematic in terms of group privacy.
Raymond [
20] defines DII as:
“either individual and/or aggregated data points that allow inferences to be drawn that enable the classification, identification, and/or the tracking of both named and/or unnamed individuals, groups of individuals, and/or multiple groups of individuals according to ethnicity, economic class, religion, gender, age, health condition, location, occupation, and/or other demographically defining factors. DII can include, though is not limited to, personally identifiable information (PII), online data, geographic and geospatial data, environmental data, survey data, census data, and/or any other dataset that can—either in isolation or in combination—enable the classification, identification, and/or tracking of a specific demographic categorization constructed by those collecting, aggregating, and/or cross-corroborating the data”.
In this paper, we use DII as the definition of a group. DII allows us to consider both algorithmically determined groups and social groups. Often, the impacts/outcomes of data technologies on algorithmically determined groups are felt disproportionately by existing social groups (e.g., biases) [
11]. More precisely, algorithmically determined groups can correlate with not one but multiple intersecting social constructs (e.g., race, ethnicity, religion, gender, socio-economic status, etc.). DII is therefore a “rules-first” method of defining groups and therefore leans on “its” privacy.
Our objective in this paper is to expand our understanding of the link between location data processing and group privacy in the context of humanitarian action. An understanding of group privacy in this context requires an investigation of the geodata technologies, the groups they represent or form, and the informational harms that arise. Therefore, we first explore the types of geodata used in disaster risk reduction and management (DRRM). Then, using the definition of DII, we explore how these geodata types are used in the context of DRRM to classify, identify, and track communities. For example, remote-sensing images in combination with AI have been used to extract rich sets of DII (e.g., socio-economic characteristics in settlements) used for decision-making. Second, we give examples of threat models that emerge in the group context concerning DII and geodata from the first objective. We categorize these into four broad informational harms: (i) biases from missing/underrepresented categories; (ii) mosaic effect—unintentional sensitive knowledge discovery from combining disparate datasets and AI’s role in facilitating and accelerating data mosaicking which obscures (minute) data problems (e.g., biases) that sum up in the end; (iii) misuse of data (whether it is shared or not); (iv) cost–benefit analysis (cost of protection vs. risk of misuse). Finally, we discuss geodata triage, making comparisons to triage in disaster and the importance of ethics and values. We propose a triage process and give examples of how to prioritize group-privacy risks based on the four informational harms.
These four harms are motivated by the fact that decisions are based on inferences from the group and these inferences are often prone to biases. There is also a tendency to merge disparate datasets to complement each other, creating new information with unprecedented potential for misuse by other parties. All these, in combination with the unknown costs–benefits of protecting group privacy, create a situation where violations go undetected. Although we do not offer mitigation strategies due to the broadness and contextual dependency of these informational harms, we see the urgency and importance of a triage process that can be used to determine how group informational harms should be prioritized for mitigation when faced with resource constraints and an impending humanitarian disaster. There is an inherent tension in (disaster) humanitarianism between fast response, accuracy, and non-maleficence that requires a debate about prioritizing and its ethical implications. We build on Raymond’s “demographic threat triage” [
20], which we interpret as a two-stage process of proactively and critically assessing the threats accompanying different datasets and their corresponding application context (threat modeling), followed by assessing the urgency of remedial action for each.
Discussion of group privacy in geodata studies is well overdue given the potential of location data in analytics and its implications on privacy. In this paper, we hope to bridge the gap between philosophy, ethics, and geodata technologies in the context of humanitarian action for DRRM and contribute to the discussion on the meaning of a trustful and ethical use of geodata technologies. We aim to show (using DII as a way of conceptualizing groups) humanitarians and geodata scientists how to evaluate the impact of group privacy on vulnerable communities.
2. Geodata in Humanitarian Action—DRRM
DRRM includes both anticipatory actions and responses to disasters by humanitarians. Risk in this context is defined as a function of hazard, exposure, and vulnerability (social and/or physical) [
21], and it quantifies the probability of loss/damage of assets in the event of a hazard (see Equation (
1)). Location is central to all three components in the risk equation, and thus important in determining where and whom to give resources. This explains the ever-evolving variety of geodata types used for DRRM. Incorporating different kinds of geodata in DRRM complements the inadequacy of individual geodata types in expressing demographic characteristics required to estimate the risk of the population due to a hazard.
In this section, first, we give examples of geodata types used for DRRM, particularly focusing on remote-sensing data and in situ data. The use of remote-sensing data is strongly highlighted because, typically, they contain no personal information but are used to infer demographic information due to the intersection of the built environment, urban poor or vulnerable communities, and weak infrastructure. As such, while there are many other types of geodata, we only focus on those that directly offer socio-economic and other demographic information relating to exposure, physical, and social vulnerability. We therefore exclude hazard-specific geodata sets (e.g., hydrography and topography datasets), since they would only show where a hazard would occur but not how subgroups of vulnerable communities would be affected because of their interaction with the built environment. These geodata typologies become useful in the following sections on the discussion of the threats emerging, especially from geodata mosaicking. Second, we categorize what DII each geodata type represents, distinguishing between exposure, physical, and social vulnerability.
Remote-sensing (RS) data are a broad category of geodata used in DRRM which includes satellite images, imagery from unmanned aerial vehicles (UAVs), and street-view images. The use of UAVs and street-view images is nascent in DRRM applications compared to satellites. The main advantage of using RS data is the timeliness of data collection, cost efficiency and the relative ease of scaling up [
22,
23,
24]. This is compared to using in situ data-collection methods like field surveys which can be costly and time-consuming when the area coverage is large. RS data capture two important components in the risk equation, namely the exposure (i.e., the exposed infrastructure) and physical vulnerability (i.e., susceptibility of the exposed infrastructure to the hazard). Social vulnerabilities are only implicitly inferred from RS data via proxies from the characterization of the exposed infrastructure and their corresponding vulnerabilities. Social vulnerability thus often requires datasets that complement the RS data, e.g., social protection data, household surveys, and or census data that also often contain spatial references [
22].
In situ geodata used for DRRM include—but are not limited to—surveys (e.g., household), census, call detail records (CDRs), and social protection data. In situ data can be point observations with geographical references or spatially aggregated data. For example, household surveys can contain the coordinates of participants’ houses, and CDRs contain the location of a mobile phone user at a particular time. In contrast to other in situ data, CDRs present different ethical problems primarily due to lack of explicit consent by the data subjects and claims that anonymizing CDRs is a solution for the privacy concerns while overlooking the group tracking implications [
25]. We therefore find it important to include CDRs in this discussion. In summary, the main advantage of in situ data collection is that it can be tailored to capture specific information that is otherwise unavailable (e.g., specific household vulnerability) but with the downside of costs.
Since each component in the risk equation aims to empirically infer different information, in the following subsections we describe the DII derivable (via the use of proxies) for exposure, physical, and social vulnerability estimation. These components are primarily used to infer patterns in the population that particularly make them susceptible to hazards.
2.1. Exposure
RS data give high-resolution information on the exposed elements in an area. These include buildings, transport infrastructure (e.g., roads), and service infrastructures offering critical service (e.g., power/communication lines, water reservoirs/points). Satellite images and UAVs offer a view from above that gives insight into the characteristics and distribution of infrastructure on the ground, while street-view images give a street perspective, e.g., the façade of buildings and number of floors—which are not observable from above [
26,
27]. Having an overview of what is exposed is a step that precedes vulnerability quantification while determining the risks. Street-view images have gained importance in auditing the built environment for post-disaster evaluation [
28] and recovery [
29]. Though the use of CDRs is not common, research notes that it has a high potential for future use in DRRM [
30]. CDRs give insights on population distribution and mobility [
30]. These can be assessed before and after a disaster [
31]. Population distribution is important in quantifying exposure, especially when census data are unavailable. In this case, CDRs can give a lower bound on the number of people affected [
30]. CDRs have particularly become a niche geodata type in tracking the spread of diseases as cascading effects of disaster (e.g., post-Haiti 2010 earthquake) [
30,
31]. The location where diseases occur becomes the point of interest used to classify, track, and identify groups at risk, hence in this case, location is the primary DII compared to RS data where infrastructure is the DII.
2.2. Physical Vulnerability
In DRRM literature, it is anticipated that among the exposed assets, there would be varying susceptibility to different kinds of hazards. This leads to the physical vulnerability characterization of the communities’ exposed assets. This characterization is inferred by the proxies of building density, road network, façade building materials, building heights, and roof types. For example, Harb et al. [
32], note that roof types can be used as a proxy to physical vulnerability, as different roof types have differential susceptibility to hazards, while heterogenous building heights in a neighborhood increase physical vulnerability of an area in the case of earthquakes. Similarly, façade building characterization is important for earthquake, tsunami, and flood physical vulnerability assessments [
23,
26,
27,
33].
2.3. Social Vulnerability
In a review of RS data proxies in disaster management, Ghaffarian et al. [
22], elaborate on how roof typology, land-use ratio, green spaces, road widths, and densities can be used to infer socio-economic vulnerability. Roof typology as a social vulnerability proxy (extractable from RS data because different materials reflect different colors) assumes that roof materials are indicators of income or wealth since it costs more to construct buildings with quality roofing materials [
22]. However, this assumption is dependent on the locational context. The quality of roads (density/connectivity and potholes) is used as a proxy since deprived areas often have poor road connectivity [
22]. Land-use ratio and available green spaces are also proxies for social vulnerability, as green spaces are often scarce in low-income areas/neighborhoods [
22]. Similar to roof typology, building typology derivable from street-view images is also considered a proxy of social vulnerability as low-income households and business owners often live or rent buildings with high physical vulnerability to disasters simply because they cannot afford otherwise. However, these (building typologies) are usually considered to be weak proxies for social vulnerability [
22]. It is also important to note that street-view images usually capture more than just the building, and can include people walking on the streets and the license plates of cars. These are usually blurred before sharing or using in maps (e.g., Google Street View) but interestingly, some homeowners are averse to having the images of their houses captured in the first place [
34]. As mentioned before, RS data have limits in quantifying social vulnerability, and this is where in situ data become very important. In situ data-collection methods are usually tailored to directly capture demographic variables that are direct factors for vulnerability. For example, household survey and social protection data would capture information such as income, occupation, household size, and composition. For CDRs, the main utility in the context of social vulnerability is in tracking movement in and out of a disaster zone and becomes especially useful in monitoring the spread of infectious diseases (i.e., a disease breakout in an area can be spread to new areas where the affected migrate to) [
30,
31]. In situ datasets also serve as ground-truth data for RS data.
In the conversation of informational privacy and harms, it does not suffice to only examine data typologies without investigating the role of technologies/methods used to process data into meaningful actionable information. For example, large-scale RS satellite is processed using AI (e.g., building detection/classification) and certainly, this makes satellite data actionable. In the section that follows, we briefly discuss the complexities that AI presents in privacy. This is an important precursor for the threat models section.
3. Geodata Technologies: Artificial Intelligence (AI) in Humanitarian Action
AI has gained much attention and accelerated adaptation in humanitarianism. The shift from a reactive to an anticipatory approach in humanitarianism is arguably one of the reasons why AI has become useful. Anticipatory approaches through the use of AI and data (both small and big data) facilitates the prediction of the needs of vulnerable communities before a disaster occurs through impact-based forecasting (i.e., pre-disaster impact assessment on communities through DRRM) and forecast-based financing (i.e., early access to funds before a disaster to mitigate the predicted impact) [
35,
36]. The use of AI is now very much engrained in RS methods in the two broad categories of object-based methods and pixel-based methods. It is used to map settlements, detect damaged buildings, and audit the built environment for recovery post-disaster.
Figure 1 demonstrates building detection and delineation using AI, while
Figure 2 shows the classification outcome of a UAV image.
The use of AI raises the concerns of biases and data privacy in humanitarianism. However, the discussion of these two ethical concerns is often focused on the individual (for the latter) and assumed to be disjointed. The link between biases and privacy is not adequately explored in the ethics of the humanitarian use of geodata technologies with only implicit references to human rights law; privacy as a human right and protection from discrimination (e.g., by Beduschi [
1] and by Pizzi et al. [
37]). On biases, recently, examples abound, and the neutrality and fairness of AI (including the data used to train AI models) have been in question, e.g., in predicting recidivism [
38] and recruitment process [
39]. AI and data are not neutral tools and are known to include and amplify biases [
40]; biases from the data collection and biases of the designers (algorithmic bias) [
41]. This is further compounded by the black-box nature of most AI algorithms, which makes it difficult to understand how the algorithm reaches its decision [
42]. Mehrabi et al. [
43] conduct a comprehensive survey of biases in AI, describing the types of biases and how they occur in AI. Bias concerns thus become entwined with group privacy (because of AI’s ability to learn, amplify, and reinforce biased patterns), and these affect the group and not just an individual.
4. Threat Models
Threat models describe the privacy challenges (geo)data technologies present. It is becoming increasingly important to conduct privacy threat modeling, considering the fast evolution and adaptation of sociotechnical systems across all fields with unprecedented privacy risks and harms [
44]. Threat models go beyond the checklist procedure that is typical of privacy impact assessments which Shapiro [
45] finds to be inadequate for tackling informational harms. Thus threat modeling could range from reflections on what could go wrong to sophisticated simulations of how a sociotechnical system affects privacy. In this article, we rely on reflections on informational privacy concerns based on informational harm concerns already identified for data technologies using AI.
As mentioned before, there has been a tendency to focus on individual privacy, which does not cover the informational privacy issues that accompany the use of data technologies, especially in the age of AI. Mainly, the informational harms that arise from the creation of group-level data and categorization. Literature on the ethical uses of data technologies for humanitarian action often mentions bias as a concern. Recently, there has been a call to also consider the mosaic effect, being new knowledge accrued from the combination of disparate datasets (see Capotosto [
46]). Privacy literature argues that though respective datasets may seem harmless (with few privacy concerns), the combination of these creates new data products with information and inferential opportunities beyond what was originally intended [
45,
47]. We observe that geodata, specifically coordinates, boundaries (spatial references), and AI make it easy to combine (add layers of disparate datasets), process, analyze, and perform predictive tasks with relative ease. It is also prudent to consider other (mis)uses of the data acquired for humanitarian action. Furthermore, threat models must consider the utility between the cost of preventing harm and the probability of informational harm. The costs of preventing harm may outweigh the probability of some informational harm.
In light of the issues discussed above vis-a-vis (geo)data technologies and AI, we limit the discussion on informational harm to four threats. The first are biases from missing/underrepresented categories. Second, the mosaic effect—unintentional (sensitive) knowledge discovery from combining disparate datasets; and AI’s role in facilitating and accelerating data mosaicking, which obscures (minute) data problems (e.g., biases) that sum up in the end. Third, misuse of data (whether it is shared or not). Fourth, cost–benefit analysis (cost of protection vs. misuse risk). These threat models are agnostic and can be investigated in any geodata technology for humanitarian action. To the best of our knowledge, there are no other studies that conduct group threat modeling for geodata technologies in humanitarian action.
4.1. Biases (From Missing Categories or Missing Data on Entire Groups or Misclassification)
Considering the types of geodata commonly used in humanitarian action, and the use of AI for anticipatory disaster management, it is evident that informational privacy in this context leans towards the collective. Remote-sensing data, especially, contain very little to no personally identifiable information, but in combination with AI, they are used to create categories of people with differing needs for aid (e.g., characterizing physical vulnerability by classifying roof typologies). Using geodata and AI as categorization technologies then leads us to consider biases as informational privacy harms as underrepresentation, misclassification, and ill-constructed categories affecting the collective with common DII attributes. There is no doubt that these biases trickle down to the individual, but this effect is only possible via the use of the DII in question [
16]. Viewing biases as violations of privacy is in line with other informational privacy research. For example, from a legal perspective, Crawford and Schultz [
48] list discrimination as one of several informational harms from “predictive big data” (where predictive big data is used to refer to both data and the AI technology). That is not to forget the implications on autonomy, which is fundamental for informational privacy [
9,
49]. The categorization that is characteristic of (geo)data technologies considered here provides no autonomy for individuals and groups to decide whether they want to be part of the group [
50].
In the following paragraph, we consider an example of using remote-sensing images to identify and delineate buildings commonly used in data-driven humanitarian work. It is not always the case that there is homogeneity in the size, roof type, and spacing of buildings (see example in
Figure 1). Given the trend to use remote sensed data and AI to generate maps of vulnerable communities, it has emerged that such complexities (i.e., size, roof typology, and spacing among buildings) can lead to biases in the generated maps. Notably, these attributes are proxies of demographic aspects (e.g., socio-economic status) and hence DII.
A specific case is where a humanitarian organization is interested in mapping an area prone to flooding. Assuming the only geodata available to humanitarians in this scenario are satellite images or UAV images which only give an overhead perspective of the area, this constrains the mapping exercise to identify buildings via their roof types. Heterogeneity in roof typology complicates automated mapping using AI especially if there is a low occurrence of some types. Such sample imbalance often results in biased classifications in mapping using AI (e.g., Abriha et al., [
51]). Often, standard practice in experimental situations is to remove classes with low samples to improve overall accuracy, but this would not suffice in humanitarian DRRM situations. Although roof typologies are not social groups per se (i.e., not a social construct like ethnicity), they are strong proxies (in some contexts/locations) for socio-economic status, and thus there is a possibility of discrimination if some roof typologies become invisible or are misclassified. This would affect the visibility of the households. Therefore, misclassification creates a situation where humanitarians must worry about which other biases are amplified. For example, some roof types may correlate with informal settlements that are often inadequately served by government functions. This scenario can be extrapolated into using street-view images for the identification and classification of buildings.
Biases do not only occur during classification. They can occur during data collection or the data-generating process. Mehrabi et al. [
43] give a comprehensive survey of the different ways biases materialize in machine learning. Certainly, biases can also occur in in situ data-collection methods.
It is important to note that conversation on biases irrespective of the data technology used cannot and should not be separated from issues of power (imbalance) and (in)justice. In this case, the power lies with the humanitarians; they decide on which communities to focus on, and the approaches to take. It is not certain whether humanitarians actively involve vulnerable communities in all stages of designing AI methods for anticipatory action.
4.2. Mosaic Effect
The mosaic effect is a phenomenon that occurs when seemingly unrelated datasets or pieces of information are combined to create new information [
46]. Although each individual dataset may seem harmless [
47], combining them, especially through analysis, allows the creation of a new information product that reveals more about an individual/population than originally intended as “boring” data points gain more significance [
45]. Adriaans [
52] links mosaicking to the additivity concept where two datasets give more information than one. Though the mosaic effect is often discussed from an individual privacy perspective with the harm being re-identification (e.g., linkage attacks such as in Netflix data [
53] or DNA genotype-phenotype data [
54]) new data mosaics may have far-reaching implications for group privacy. In the context of humanitarian action this line of research is still nascent with literature only highlighting mosaic effect concerns on open datasets for humanitarian action mosaicked with social protection data [
46]. An example that showcases this trend in combining social protection data and other geodata is in Malawi [
55].
In geodata, direct (e.g., coordinates) and indirect (e.g., addresses or administrative boundaries) spatial references [
56] are a medium that facilitates linking disparate datasets. This essentially makes geodata unique from other kinds of data and how they are combined, as location becomes the key factor by which geodata are merged. For example, satellite and UAV images can be linked with street-view images, census data, social protection data, and surveys or census data. As mentioned before, integrating socio-economic data complements the shortcomings (sometimes weak proxies) of some remote-sensing data and this is only possible through intersecting locations of various datasets. This is evident in anticipatory action for disasters in humanitarianism, e.g., mosaicking UAV images with street-view images and social protection data. Although UAV images and street-view images combine to show physical vulnerability due to a hazard, social protection data complements this by showing the social component of vulnerability (e.g., income and health). Depending on the granularity (e.g., from administrative boundaries to household level) of the mosaicked data this can show patterns in the populations such as clusters with incidents of certain diseases. This information is important for humanitarianism but a group-privacy threat if accessible to the wider public.
The role of AI in data mosaicking cannot be overstated. Each dataset may have its own information extraction process. For example, with satellite or UAV images, the task may be to classify built areas, map green spaces, or classify parcels of land into use categories (e.g., industrial, farm, residential). Street-view images on the other hand can be used to classify building typologies, while other datasets like household surveys give insights on socio-economic and behavioral patterns of communities. All these tasks are increasingly carried out by AI which makes the informational flow easier and faster for decisions. Ultimately, this creates a rich set of detailed DII. Repeating the process from data collection to processing using AI with some more regular frequency, this becomes quintessential surveillance with immense knowledge accrual on vulnerable communities. Whether this knowledge is shared with people or organizations with similar principles of “do no harm” or not, does not excuse the question of whether so much knowledge on vulnerable groups should be gathered and stored in the first place without understanding the risks first [
20].
As an example, consider a flood-prone area where, for aid delivery other than predictions of the amount of rainfall or flood level, a humanitarian organization would also need to know what is exposed, what are the physical vulnerabilities and the social vulnerability of the community (i.e., can they cope?). Should they also worry about disease outbreaks that commonly accompany flooding? No one dataset suffices to give information on all these aspects. RS data with the use of AI as discussed previously is used to detect and classify buildings and based on this characterization, a damage index can be developed for the various building typologies. But that is as much information extricable from RS data. For social vulnerability, humanitarians in this case would turn to in situ geodata (e.g., social protection data) that give information on how well the community can cope with the disaster. These data would contain indices such as household income, level of education, health etc. Combining RS data and social vulnerability geodata humanitarians can make group-level inferences for example socio-economic and vulnerability status for the various building typologies. Such inferences consequently are used to decide the amount of aid to give to various groups. Adding CDRs to these datasets enables tracking; where are the various groups of affected people moving to? The resulting mosaic DII exceeds the potential for each individual dataset typology with potential for other (mi)uses. Even though it might not be fine-grained DII, from a broader view/perspective, it gives information on groups. Furthermore, if there are biases in each geodata typology and the resulting inferences then these problems sum up and result in a biased DRRM process.
4.3. Misuse—(Known) Blind Spots: What Else Could These Data Be Used for?
Data misuse is simply using data for purposes that were not originally intended. In individual informational privacy, this is framed as uses that individuals did not explicitly consent to. The informational harms towards individuals and groups are covert surveillance and unwanted solicitation [
57], for example through personalized ads and marketing strategies that nudge people to spend on items, vote in favor or against someone/something, or generally behave in a certain way. Inferences that allow for this targeting are made possible by DII which allow for categorization and tracking of groups in the first place. In modeling misuse threats of geodata technologies one must consider other parties’ interests (e.g., governments and organizations). Context and socio-political dynamics also factor in. For example, in countries with civil strife, the concern is that geodata collected by humanitarians could be used for surveillance by oppressive governments or malevolent groups. With experimental technologies modeling threats due to misuse requires quite a bit of foresight, otherwise one must rely on past incidences as examples—as we do here.
Geodata and geodata technologies discussed in sections before not only provide useful DII for DRRM applications but also have value in other applications. Remote-sensing images will inevitably capture or contain other objects of interest for other applications. Hence, there is a concern about geodata and the resulting DII acquired from vulnerable communities for anticipatory action being used for other purposes that do not align with the humanitarian agenda. Examples include but are not limited to (i) detailed temporal geodata could be used for surveillance by malevolent parties on vulnerable communities; (ii) geodata are useful in the marketing and service industries; (iii) upgrading programs that can lead to displacements or gentrification; for example, in informal settlements. Raymond et al. [
58] give an example case of the project Sentinel where temporal satellite data were used to analyze the progression of conflict in the Sudan. The DII in this case were vehicle size used to identify military vehicles and house clusters showing settlements. The temporal satellite data did not meet the humanitarian objective but instead may have unintentionally served the antagonists with real-time intelligence through the sharing of near-real-time information and analysis [
58]. Raymond et al. [
58] further remark that this might have turned “everyone” into a witness. This example shows how sharing information in a conflict setting can be harmful despite the good intentions of humanitarian work.
RS data are also used in other development agendas, and these are usually multi-stakeholder (e.g., governments and private investors). RS data have uses for poverty mapping [
59]. This is useful in determining where to focus resources for development. Although development is advantageous at face value, critical studies ask for whom it is advantageous for whom. Communities living in such deprived areas are often concerned about displacement without alternatives [
59]. If it is improved housing or neighborhoods that are hazard-resilient, it could end up being out of financial reach for the original inhabitants, leading to gentrification. This is no doubt a social justice issue.
In sum, threat modeling for misuse not only shows that we need to consider other uses but also requires a review of whom to trust and this can be done through considering compatibility of principles/values. In the case of sharing data, humanitarians are urged to “share alike”, i.e., share information with others that subscribe to the same principles of “do no harm” [
46,
60]. This means that trust is needed among humanitarian organizations and between organizations and communities that share geodata containing DII. Due diligence cannot be over-emphasized to ensure geodata and geodata technologies are not misused to disadvantage vulnerable communities.
4.4. Cost–Benefit Analysis
Cost–benefit analysis in this context considers the cost of protecting group information versus the potential for misuse. Cost–benefit here does not consider the economic trade-offs with respect to costs but rather as values. If the costs of protection outweigh the potential for misuse, then there is no incentive for protection, leaving data subjects vulnerable to informational harm. With regard to group privacy and DII, the risks are not usually perceived to be as grave as those posed to an individual. This is not necessarily true, especially in the context of (geo)data technologies and AI where the risks of collecting and sharing DII are not immediately clear and this might be confused for non-existent or minimal risks. Although there is no evidence to show that protecting group information is less important compared to individual information, this can cause laxity, and the potential for information harm still goes undetected and unchecked. Reactiveness as opposed to proactiveness in threat modeling on group informational harms re-enforce—if not increase—the potential for informational harms as more varieties of geodata and geodata technologies are used.
In humanitarianism, there is an inherent tension between visibility and privacy from the beneficiaries’ point of view [
61]. Although generating knowledge of the vulnerability of groups is problematic in this case, it is needed for aid resource allocation. The issue to consider is whether the discussion is balanced rather than leaning towards a lax stance; lack of threat modeling and triage while only looking at the knowledge discovery potential. Given that we do not yet have a comprehensive understanding of informational harms posed to the group (given they are context-specific) therefore cannot claim that the benefits always outweigh the risks.
5. Geodata Triage
Raymond [
20] advocates for continuous “demographic threat triage” as a mechanism to identify and keep abreast with the evolving harms related to DII collected and processed by civil society (incl. humanitarianism). Triage here is thought of as an independent step following threat modeling (identification of the risks and harms possible) since triage can only be done after understanding the harms. In this section, we build on scholarship on triage, from its origin in battlefield and its prominent use in emergency medicine. The scholarship emphasizes that triage is useful when situations compete for attention amid a lack of resources e.g., [
62]. Furthermore, revisiting our threat models discussed above, we give examples of triage scenarios with respect to DII and context.
A set of objectives determines how to prioritize resources. Mitchell’s [
63] description of the evolution of triage—from prioritizing treatment for soldiers’ quick return to the battlefield during war to prioritizing injured civilians shows how institutions shape triage objectives or policies. Therefore, it is not only based on individual judgment. For instance, in emergency medicine, incoming patients at a hospital undergo an initial assessment of the “severity of illness or injuries” prioritizing resources (e.g., time, equipment, and personnel) for the most urgent cases [
64] and this criterion is formulated at the administrative health care level but also can be subject to individual choices [
62].
Societal and institutional values also shape triage objectives [
65]. As argued by Zach [
66], DRRM in humanitarianism is essentially a triage process with its own values and ethical considerations: Some communities may require more attention than others and some may be in urgent need of support while others need less or no support. The different levels of attention required can and should also be mirrored in the way data are handled. Thus, there is a need to consider a form of “data triage”. Therefore, besides triaging vulnerable communities at risk of disasters and assessing the data needs for decision-making, we argue that it is prudent that informational privacy triage should be undertaken. Ideally, all kinds of geodata should be handled with utmost care but resource constraints (e.g., time, personnel, etc.) during anticipatory preparations and response would compel humanitarians to prioritize some data types and analysis approaches over others. The triage objective hence also includes the identification and prioritization of geodata technologies based on how much they impact the values in humanitarianism, specifically (based on our threat models) impartiality and humanity (preserving dignity).
Similar to medical triage we find it useful to categorize geodata technologies into intervention, observation, and non-action based on informational harms from DII (see
Table 1). We introduce a triage process (see
Figure 3) that focuses on contextualization of DII and group-privacy threats from geodata technologies and terminates when the geodata and geodata technologies are classified as “Non-action”, “Observation” or “Intervention”. This process allows for re-assessment should the context and use status change.
At the start of the triage process, geodata and geodata technologies are assessed for DII implications (i.e., which DII can be extracted from the use of the geodata and geodata technologies). If there are no implications for DII, no urgent use cases, and due diligence has been done (e.g., geodata are securely stored) then the geodata and corresponding geodata technology can be classified under “Non-action”. Otherwise, the process proceeds to aligning group-privacy threats (e.g., from
Section 4) with the DII. The next step is context analysis where both DII and group-privacy threats are assessed through the lens of the local context (e.g., political stability). Context analysis could be a combined effort among local disaster management officials, first responders, and the community involved. All three actors should analyze how local socio-economic, and socio-political dynamics could negatively use the DII. We emphasize contextualization in the triage process because of the dynamic use of DII and the multiplicity of group-privacy informational harms. DII in a conflict context could be used to target and inflict harm on vulnerable communities (an example case is Project Sentinel by Raymond [
58]). If the DII has contextual relevance then the next step is assessing the DRRM phase (e.g., response or preparation) since the DRRM phase determines the frequency and urgency of the use of geodata and geodata technologies. Active use determines whether the geodata and geodata technologies that produce the DII are classified as either under the “Observation” category or the “Intervention” category. Active use of geodata and geodata technologies requires that intervention measures are taken. However, if the geodata technologies are not in active use in DRRM but, mosaicking and data processing have begun, then the geodata and technologies can be monitored under the “Observation” category. “Observation”, would contain geodata with DII that do not immediately warrant intervention but nonetheless should be monitored. Otherwise, if the geodata technologies are not relevant to the context and are not in use, and due diligence has been followed, then they are categorized under the “Non-action” category. The “non-action” category means the geodata are stored securely and not in use, and therefore the threats discussed above do not apply. At each step of the triage process, there should be a continuous referral to the societal and institutional values and ethics.
It is also important to consider the movement between categories. We suggest that location context and evolving situations on the ground play a role in category movement. For example, if a disaster-prone area suddenly and unfortunately became a contested territory between two rival countries then any geodata and processes that allow for aggregation, categorization, and tracking of groups become even more sensitive. Because such scenarios are hard to predict it therefore becomes ever more important to continuously triage geodata. This is reflected in the triage process (
Figure 3) with re-assessment from the context analysis step.
Recalling the example of classification of buildings by roof types from UAV data with the outcome being a map of the affected area dictating where to concentrate resources. If AI is the primary method for classification, then biases as information harm trumps other harms. The priority becomes determining which demographic groups are of interest in the classification task. If the harm is biased against a vulnerable group, then solving or mitigating the biases takes precedence over novel knowledge creation and harm from misuse. Because of the urgency in humanitarian work, this ensures that the task is completed correctly and aid resources are allocated appropriately. Misclassifying roof types or buildings characteristic of a vulnerable demographic group would be disastrous harm as it would lead to inadequate aid. If there are minimal risks to biases, but rather the greater risk is in misuse of the demographic data (e.g., tracking vulnerable groups by malevolent parties) then this case trumps general novel knowledge creation on groups. Then the mitigation in this case is careful consideration of the disclosure, sharing, and what else could it be used for including which other datasets could it be mosaicked with.
6. Discussion
Defining groups is certainly still a challenge in group-privacy scholarship. In this paper we used the concept of DII coined by Raymond [
20] which in sum is any information that can be used to classify, identify, or track groups of people. We highlight that geodata are particularly prone to exposing DII and therefore require special attention in how they are made actionable. Sampling a variety of commonly used and novel geodata types in humanitarianism for DRRM we set out to understand how they are used or can be used to classify, identify, and or track people. Remote-sensing data—which includes satellite images, UAV images, and street-view images—were most commonly used to classify physical vulnerabilities of infrastructure (esp. buildings). For example, roof types and building sizes have been documented to be useful DII in characterizing physical vulnerabilities. But often these need to be complemented by in situ data types such (e.g., household surveys). CDRs on the other hand are an emerging geodata type specifically used to track movements in and out of disaster areas with particular interest in the spread of infectious diseases.
The second objective was to conduct threat modeling for group privacy. Since geodata technologies used in disaster humanitarianism are used to classify or discriminate between groups that need aid, biases emerged as one of the major informational harms. Biases not only undermine the impartiality principle in humanitarianism, dampen trust between organizations and communities they serve but also may divert aid from where it is needed the most. Biases may occur from misclassification and biased data (in the case of in situ data). There is also concern about the extent of new knowledge generation on vulnerable communities that comes from mosaicking disparate datasets. What emerges is that location makes it easy to merge disparate datasets and AI makes it easier to process such data. However, such automation with methods that are considered black boxes is prone to amplifying data problems. We also reflect on what else these geodata technologies could be used for: that is outside of the humanitarian context. The same data used to decide whom to give aid may very well be used to decide whom to evict from disaster-prone areas or contested land. Group privacy thus far has not received the attention it deserves compared to individual privacy. The costs–benefit analysis of group-privacy needs a detailed investigation since so far, the costs seem greater than the reward. In sum, threat models need to evolve as geodata technologies evolve.
Though we do not yet offer solutions or mitigation strategies for the group informational harms discussed above, we do suggest a triage process. We argue for the importance of triage (by considering Zach’s argument for triage in disaster response [
66]) given the multitude of group-privacy informational harm and constraint of resources to mitigate the threats. The proposed triage process emphasizes the contextualization of DII with their corresponding group-privacy threats. Drawing from triage in emergency medicine we also find it useful to categorize geodata and geodata technologies under “Intervention”, “Observation” or “Non-action” categories. This categorization is a continuous process that allows for geodata technologies to be reclassified as use cases and context evolves. This reclassification (moving geodata technologies among categories) reinforces the need for humanitarians to be vigilant with geodata on vulnerable communities. Since humanitarian organizations have multiple institutional levels, we envision a triage process where there is constant communication among the various institution levels (i.e., the institution, local response management teams, or the fast responders). The overall responsibility of securing geodata not in use (under the “non-action” category) would lie with an institution as a whole.
What we find to be particularly difficult with triage is the metrics to use in the categorization. Triage metrics raise a variety of questions. For example, what are the threshold levels for group informational harms or risks to be categorized as “observation” instead of triggering an intervention process? In the threat modeling (specifically bias subsection), if the AI model is known to misclassify 4 in every 10 houses, would this be enough to warrant an intervention under time constraints to save lives? How should humanitarian organizations measure the potential for geodata technology misuse? Answering such questions is instrumental in developing a triage rubric and an agnostic rubric might not generalize well to evolving group informational harms. The triage rubric ideally should be specific to the context and geodata technology.
The objectiveness of the triage brings to the forefront the power of the institutions and the individual in decision-making. For accountability, triage processes should be transparent to vulnerable communities (i.e., making it known why certain geodata and informational harms are prioritized over others). An important lesson from our attempt at triaging is the need to contextualize the harms. Asking which group is at risk and why gives insights into how to prioritize group privacy. The example triage given above may, however, not necessarily be optimal in other contexts. Therefore, we see applications for “contextual integrity” [
67] in this group-privacy triage.
7. Conclusions
The humanitarian field has increasingly leveraged (geo)data technologies, especially in DRRM, to determine where, when, and to whom to give aid (e.g., the use of early-warning, early-action, and forecast-based financing systems). Humanitarians always strive to abide by their core principles (humanitarianism is governed by 4 core principles of humanity, impartiality, neutrality, and independence [
4]) and this translates to how they process information on vulnerable communities because of privacy concerns, mainly because of the potential of the information to be used for non-humanitarian purposes by malevolent groups. However, privacy concerns are focused on personal information (e.g., names of refugees) while, for example, the location of camps and basic aggregate demographic information would still be available to the public. Moreover, it has emerged that geodata leveraged by humanitarians for DRRM purposes do not necessarily contain personal information, but still have potential for harm. This is the premise of group privacy, more so when the objective and the norm of geodata technologies is to generate groups to aid in decision-making [
16].
We therefore set out to unravel the group-privacy harms foreseeable from geodata technologies commonly used in humanitarian action, specifically DRRM. We leveraged the concept of DII, which is concerned with information that can be used to classify, identify, or track groups. Particularly focusing on geodata (e.g., remote-sensing images) and the use of AI to analyze these, we explored four potential informational harms. These were (i) biases, (ii) the mosaic effect, (iii) misuse, and (iv) cost–benefit analysis.
One of the debates on group privacy is how to define groups in the first place, and this has implications on whether to aim for “their privacy” or “its privacy” [
16]. From the example use cases of geodata technologies, we note that the groups are formed based on rules that are predefined and change based on context, e.g., the use of remote images to classify buildings by roof typology or build materials. These are not the typical demographic groupings (gender, age, etc.) but rather are used as proxies for other contextual attributes of interest. In case of flooding, different buildings characterized by their build materials will incur differential damage. Such background knowledge of the susceptibility of buildings to hazards dictates how to classify these buildings. However, these are buildings that households live in and therefore correlate with other socio-economic and cultural dynamics. Therefore, using DII to define groups means focusing on “its” (ref. group) privacy rather than “their” privacy.
Biases from data technologies in general are a rising concern, especially where AI is involved, to make decisions that affect people. Using the example of roof typology as DII and how biases can emerge during classification due to underrepresentation, we demonstrated how these can perpetuate existing social inequities when applied to the humanitarian agenda. Furthermore, there is a persistent concern about what other new DII could emerge from a combination of disparate geodata in combination with AI. There is the risk of these DII creating new knowledge with the potential for misuse outside the scope of humanitarian work. The fact that cost–benefit analysis (cost vs. the potential for misuse) for group privacy is currently considered to be secondary to personal information privacy is a big informational-harm concern. The potential for misuse of DII is equally if not more concerning in some contexts, such as individual privacy. In light of this, geodata triage becomes an important aspect in prioritizing geodata technologies and context for group-privacy preservation. We therefore present a geodata triage process for DII input, geodata technology in use, and the accompanying threat models. The triage process emphasizes the contextualization of geodata technologies that culminates in classifying geodata technologies as either belonging to “Intervention”, “Observation”, or “Non-action” categories. Future research in geodata studies and humanitarianism should focus on developing a robust rubric for the triage process—taking into account the wide variety of geodata types and contexts that apply to humanitarianism.