Uncertainty in XAI: Human Perception and Modeling Approaches

Chiaburu, Teodor; Haußer, Frank; Bießmann, Felix

doi:10.3390/make6020055

Open AccessArticle

Uncertainty in XAI: Human Perception and Modeling Approaches

by

Teodor Chiaburu

¹,

Frank Haußer

¹

and

Felix Bießmann

^1,2,*

¹

Berliner Hochschule für Technik, Department of Data Science, Luxemburger Str. 10, 13353 Berlin, Germany

²

Einstein Center Digital Future, Wilhelmstraße 67, 10117 Berlin, Germany

^*

Author to whom correspondence should be addressed.

Mach. Learn. Knowl. Extr. 2024, 6(2), 1170-1192; https://doi.org/10.3390/make6020055

Submission received: 14 April 2024 / Revised: 6 May 2024 / Accepted: 20 May 2024 / Published: 27 May 2024

Download

Browse Figures

Versions Notes

Abstract

:

Artificial Intelligence (AI) plays an increasingly integral role in decision-making processes. In order to foster trust in AI predictions, many approaches towards explainable AI (XAI) have been developed and evaluated. Surprisingly, one factor that is essential for trust has been underrepresented in XAI research so far: uncertainty, both with respect to how it is modeled in Machine Learning (ML) and XAI as well as how it is perceived by humans relying on AI assistance. This review paper provides an in-depth analysis of both aspects. We review established and recent methods to account for uncertainty in ML models and XAI approaches and we discuss empirical evidence on how model uncertainty is perceived by human users of XAI systems. We summarize the methodological advancements and limitations of methods and human perception. Finally, we discuss the implications of the current state of the art in model development and research on human perception. We believe highlighting the role of uncertainty in XAI will be helpful to both practitioners and researchers and could ultimately support more responsible use of AI in practical applications.

Keywords:

machine learning; XAI; uncertainty; human perception

1. Introduction

The increasing adoption of ML models across various domains necessitates interpretability and trust. While XAI approaches aim to shed light on model decision making, a crucial aspect often remains opaque: uncertainty. Ignoring uncertainty can lead to overconfident and potentially risky decisions. Conversely, effectively communicating uncertainty within XAI frameworks can enhance trust and facilitate responsible use of AI.

Uncertainty in AI systems stems from various sources. Data limitations play a significant role. Real-world data often contain noise, biases, and inherent randomness. Models trained on such data inherit these uncertainties and may struggle with generalizing to unseen situations. Additionally, the inherent complexity of some models themselves can contribute to uncertainty. Deep Learning models, for instance, can be highly non-linear, making it challenging to pinpoint the precise reasoning behind a prediction. Furthermore, the inherent complexity of the world we aim to model can also be a source of uncertainty. Real-world phenomena are often noisy and subject to unforeseen variables, which AI systems may struggle to capture perfectly. Unlike mathematics, where clear, logical steps lead to a definite outcome, many real-world domains often defy such clear-cut explanations, such that even highly experienced specialists may struggle to articulate the intricate reasoning behind their decisions, relying on a blend of intuition, experience, and incomplete information. All these uncertainty sources and their standard categorization into aleatoric vs. epistemic are discussed thoroughly in Section 3.

Uncertainty quantification has been studied extensively in ML and statistics. Established approaches include, for instance, model agnostic calibration methods such as conformal prediction, which aim to calibrate the unnormalized outputs of neural networks to represent true uncertainty estimates [1]. Additionally, techniques for specific ML model classes were proposed, such as Bayesian Neural Networks (BNNs), which model the uncertainty of individual model parameters in order to generate probability distributions for model predictions [2]. These and other established approaches will be further explored in Section 4. However, a significant research gap exists in measuring the uncertainty associated with explanations generated by XAI methods, regardless of whether they are post hoc explanations or inherently derived from interpretable models. This paper aims to address this gap by providing an overview of the concept of uncertainty in explanations, exploring methods for its modeling and measurement and discussing effective communication strategies for conveying uncertainty to the target audience.

Understanding how humans perceive and interact with uncertainty information presented through XAI is pivotal. Research in this area explores the correlation between the level of uncertainty conveyed by the XAI system and the user’s own level of uncertainty or trust in the explanation and prediction [3]. Currently, there is a lack of human studies investigating the users’ perception of explanations, when the uncertainty in those explanations is also presented to them. One hypothesis is that users are more likely to trust and accept a prediction when an XAI system effectively communicates uncertainty. On the other hand, a lack of awareness of uncertainty can cause users to over- or underestimate the reliability of model predictions, potentially leading to poor decisions. These aspects will be discussed in Section 5.

Note that this study focuses on uncertainty of explanations and how they are perceived by human subjects. A complementary interesting research angle would be explanations of model uncertainty—for instance to debug models or clean data. For the sake of clarity, we did not include this research and refer the interested reader to, e.g., [4,5,6,7].

The following sections first provide a summary of the most relevant concepts of XAI (Section 2) and uncertainty modeling in ML (Section 3.1). In Section 3.2, we motivate in more detail why uncertainty in XAI is a timely and relevant subject of research. We continue in Section 4 with a comprehensive summary of recent approaches towards uncertainty modeling in XAI, along with a detailed conceptualization of approaches and a formal treatment of modeling decisions. In Section 5, we discuss empirical evidence on the impact of uncertainty in explanations on human cognition, followed by a summarizing discussion in Section 6. To the best of our knowledge, this is the first comprehensive review that integrates both the modeling aspect of uncertainty in explanations and the human perceptual dimension of explanatory uncertainty.

2. What Is XAI?

Over the past several decades, and particularly in recent years, automated decision systems have found their way into an increasing array of high-stakes and safety-critical domains, such as self-driving cars, medical diagnosis, and legal proceedings. While these intelligent artificial agents often produce remarkably accurate results, even surpassing human performance in some cases, their sheer complexity often exceeds our ability to fully comprehend their inner workings. The substantial research and funding dedicated to AI development have yielded models of immense scale; consider, for instance, GPT 3.5 with its 175 billion parameters and its successor GPT 4, boasting a staggering 1.8 trillion parameters (https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked (accessed on 19 May 2024)). At such unprecedented magnitudes, it becomes increasingly challenging to track the intricate decision-making processes employed by these systems. This opacity raises significant concerns regarding the transparency of software products developed by major technology companies and their potential impact on critical decision-making scenarios.

Fortunately, there appears to be a burgeoning consciousness and sense of urgency regarding this matter within the broader AI community and at the highest levels of governing bodies. The General Data Protection Regulation (GDPR), introduced in May 2018, marked a significant step forward by establishing the “right of explanation” for users, granting them the entitlement to obtain “meaningful explanations of the logic involved” in automated decision-making processes [8,9]. More recently, in December 2023, the EU AI Act was adopted, classifying AI applications according to their inherent risks (https://www.europarl.europa.eu/news/en/headlines/society/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence (accessed on 19 May 2024)). This categorization aims to establish a framework for regulating AI development and deployment, ensuring that AI systems are designed and utilized in a responsible and ethical manner.

As for the nomenclature, a nuanced debate persists within the XAI literature regarding the precise distinction between the terms "explainability" and "interpretability". While some authors employ these terms synonymously, others advocate for a more granular differentiation, i.e., further differentiating explainability and interpretability from "transparency" [10] or “comprehensibility” [9,11,12]. In this paper, we will only be using the first two standard terms. For example, ref. [13] associates interpretability with the end-user’s ability to grasp cause-and-effect relationships within the AI system, independent of comprehending the intricate internal logic that generates specific outputs. Conversely, explainability focuses on elucidating the inner workings of the AI system itself and requires an understanding for how ML/AI systems are built. On a similar note, [10] differentiates between the terms explanation as “the information provided by a system to outline the cause and reason for a decision” and interpretation as “the understanding gained by an agent with regard to the cause for a system’s decision when presented with an explanation”.

2.1. Why XAI?

As discussed in [12], the priority of XAI is the development of human-interpretable models or techniques to render models’ predictions interpretable, particularly in sensitive domains such as healthcare, finance, or military. This focus stems from the dual needs of domain experts: effective problem-solving assisted by AI and a level of transparency that fosters trust in the provided solutions. However, examining interpretable outputs benefits not only domain specialists; the value of explanations extends beyond individual stakeholders. For end-users (possibly laymen), explanations are paramount for calibrating their trust in the AI system. Ideally, explanations should render the system’s decision-making process understandable and transparent. For AI model developers, in the case of incorrect or questionable outputs, explanations can prompt further investigation and debugging cycles to uncover hidden biases, identify blind spots, and potentially improve the system’s performance. Lastly, as noted above, the growing emphasis on explainability is (also) driven by legal and regulatory considerations. Enforcing EU’s Right of Explanation policy necessitates transparency and understandability in AI systems, ensuring accountability and mitigating potential risks for the “AI-consumers”. All in all, it appears that only trustworthy AI solutions will be widely accepted by users and industries in the long run [14]. Yet, developing trustworthy models poses notable challenges for all the parties involved, since a reliable intelligent system is expected to fulfill several desiderata, as discussed in the following section.

2.2. Requirements of Explanations

With these various applications and purposes in mind, the XAI community has been arguing for several years what qualities good explanations should fulfill [12,15,16,17,18,19,20,21,22,23,24,25]. While some debate persists, a degree of consensus has been reached concerning the importance of several key explanation properties. Firstly, faithfulness (or fidelity) ensures that the XAI method accurately reflects the underlying model’s decision-making process. Secondly, stability necessitates consistency in the explanations generated for the same input and model across multiple executions. Thirdly, robustness dictates that slight variations in the input or model predictions should not lead to significant changes in the explanations themselves, provided the underlying model itself is also robust.

Another important desideratum of explanations is usefulness for humans in assistive decision making, often measured by the impact of explanations in annotation tasks [23]. Usefulness could be regarded as the most relevant quality criterion of explanations: the very reason why researchers develop XAI methods is to enable humans to make better AI-assisted decisions. While faithfulness, stability and robustness have the decisive advantage that they are relatively easy to measure, they are not related to how useful or comprehensible an explanation is to humans [26]. Measuring usefulness for humans requires studies with humans in the loop, which is time-consuming and cost-intensive. It is reasonable to assume that faithfulness, stability, and robustness are popular metrics just because they are easy to measure and not because they measure what XAI methods would ideally optimize.

Depending on the application case, the human subjects may be recruited either as AI practitioners or domain experts or laypersons, as further discussed in Section 2.6. The participants’ level of expertise is an important factor to consider when designing such experiments (as shown in [27]), but also their personality traits (see [28]), which adds a new dimension to the challenges of building efficient user studies.

Beyond these foundational properties, the contemporary XAI discourse increasingly emphasizes additional requirements. Fairness considerations ensure, on the one hand, that explanations are inherently unbiased and do not produce additional discriminatory correlations between the model’s predictions and its inputs. On the other hand, fair explanations should ideally inform the user about potential biases learned by the predictor. Privacy concerns necessitate that explanations do not reveal sensitive information about the model or the data it is trained on. This focus on broader ethical considerations reflects the growing interest in AI regulation, particularly within the political sphere, where policymakers strive to govern the responsible deployment of AI at scale, as mentioned at the beginning of this section.

2.3. Categorization of XAI Methods

The catalog of XAI methods has been growing steadily throughout the past years, with more and more new explainability approaches being developed every year, especially as new breakthroughs in foundation models such as text-to-image or conversational AI are achieved. Already, there have been exhaustive survey/review papers released aiming at structuring the wide array of XAI techniques available nowadays [9,11,12,13,19,29,30,31].

The taxonomy of XAI techniques can be built upon various categorisation criteria (see Figure 1). For instance, one can distinguish between methods that explain the model locally (one instance at a time), e.g., saliency maps computed via methods such as Integrated Gradients [32], or globally (attempting to explain the decision making mechanisms as a whole), e.g., Partial Dependence Plots [31]. Secondly, there are XAI methods tailored for only a specific type of model. For instance, some can only be applied to the layers of a (convolutional) neural network, such as Testing with Concept Activation Vectors (TCAV) [33]; these are called model specific. More versatile methods that can be applied to any model are referred to as model agnostic, for example Local Interpretable Model-agnostic Explanations (LIME) [34]. Thirdly, post hoc explanations are delivered after the model has been fully trained, e.g., Individual Conditional Expectation (ICE) plots [31], while inherently interpretable models are designed such that their decisions and the corresponding explanations can be understood by humans simply by inspecting model parameters, such as coefficients of parametric models, e.g., Decision Trees or intermediate layer weights, e.g., Concept Bottleneck Models [35]. Lastly, as far as the explanation format is concerned, XAI methods can be categorized into feature attribution methods (highlighting relevant features in the input for the prediction), e.g., saliency maps, feature visualization methods (visualizing in a humanly intuitive manner what features the model has learned during training), e.g., Network DisSection [36], concept-based methods (formulating explanations with the help of easily understandable concepts) such as TCAV [33], example-based methods (searching for similar samples that led to similar predictions), e.g., Representer Point Selection [37], or text-based (captioning/describing the input via text) such as NLX-GPT [38].

2.4. Predictive Performance vs. Explainability

The notion of explainability as a cornerstone of responsible AI development raises a compelling question: why not simply employ ML models that inherently possess interpretability, such as k-Nearest-Neighbors or Decision Trees? While these algorithms are much easier to understand and debug compared to Deep Neural Networks (DNNs), they often fall short in achieving the same level of predictive performance. This trade-off between performance and explainability is a well-known impediment in the field of AI, where the complexity of a model often correlates directly with its predictive performance, while its transparency inversely relates to its complexity (see Figure 2).

2.5. Human Perception and XAI

One of the most fundamental assumptions of XAI research is that explanations improve human–AI team performance. Hence, a key research question in XAI is how explanations impact human–AI collaboration. While there is a large and ever growing body of literature on this topic (see the exhaustive reviews in [40,41]), the results are often complex and suggest that the impact of explanations on human perception depends on a variety of factors.

There are a number of studies that report positive effects of explanations on human perception [23,41,42,43,44,45,46,47]. These experiments investigated tasks in various modalities and ranged from asking users to detect hate speech, predict students’ grades, predict an agent’s performance in frozen frames of video games, or decide on (virtual) patients’ insulin intake.

Other studies report the opposite effect that explanations either do not have an exclusively positive impact or maybe even only negative impact on human subjects in cognitive tasks [48,49,50,51]. Performance drops in cognitive tasks are often attributed to higher cognitive load when explanations are provided [51].

In general, negative impact of explanations on human performance in cognitive tasks can be categorized into algorithm aversion [52], meaning that subjects tend to not trust the AI, and blind trust, meaning that subjects follow even wrong predictions of an assistive AI. We return to this distinction in Section 3.2. Not all studies on human perception and XAI investigate this difference when they analyze the impact of XAI. However, there is empirical evidence suggesting that blind trust could be increased when explanations are provided. These effects can be associated with psychological traits such as risk aversion [28]. In this respect, other studies [53] have investigated how human subjects can correctly use their intuition to reduce their overreliance on AI’s assistance.

2.6. Evaluation of Explanations

Despite decades of research and a growing body of literature [22,29,54,55,56,57], the evaluation of XAI methods remains to this day a subject of active research. Two primary paradigms dominate the discourse: fully automated evaluation and human-in-the-loop (HIL) approaches. Proponents of the former approach advocate for quantitative metrics to assess explanations objectively [15,58,59]. However, defining universally applicable metrics that capture the essence of explanations and allow comparisons across diverse XAI methods proves challenging. The latter approach emphasizes the importance of human judgment through experiments involving human subjects [60,61,62,63,64]. However, scalability and reproducibility are noteworthy concerns. Additionally, reliance on intuition and potential biases in human subjects, especially when recruited through online crowdsourcing platforms, such as Amazon Mechanical Turk or Toloka, which are largely unregulated, can introduce subjectivity and limit the generalizability of findings [65]. A recent extensive survey of human-based XAI evaluations is given in [40].

In [19], the authors discuss the importance of bridging both worlds of XAI evaluation and propose a three-level categorization of interpretability evaluation: application-grounded (involving human domain experts), human-grounded (including laypersons as end-users), and functionally grounded (using proxy measures for quantifying the quality of explanations).

Regardless of the evaluation paradigm (with or without humans), several works draw attention on the perils of intermingling the evaluation of the plausibility and the evaluation of the correctness of the explanation. A plausible explanation is not guaranteed to faithfully mirror the decision making processes of the model [66]. The authors of [67] argue that “An explanation that seems unreasonable could indicate either a failure of the system to process information in a reasonable way, or it could indicate the failure of the explanation generator to create a reasonable description”. Visual inspection might not distinguish in all cases between the two possibilities.

3. What Is Uncertainty?

Uncertainty, as defined by [68], refers to a “lack of knowledge about some outcome”. In this section, we shortly review uncertainty in the context of ML and discuss the importance of considering not only uncertainty in a model’s prediction but also in its explanations.

3.1. Uncertainty in ML

The manner of expressing uncertainty varies depending on the specific ML problem being addressed (classification vs. regression, tabular vs. text, and so on). A commonly used categorization of uncertainty in the literature usually implies differentiating between aleatoric and epistemic uncertainty [69,70]. Following the notation convention in [70], let

{(x_{1}, y_{1}), \dots, (x_{n}, y_{n})}

be the training set on which the supervised learner is induced. After training, its task will be to predict the outcome

y_{n + 1}

for a new query point

x_{n + 1}

. Let

F

be the functional space of all the possible predictors and

f^{*} \in F

the perfect predictor modeling the real-world phenomena associated with the task at hand and always outputting the ground truth. Since learning

f^{*}

is unrealistic, several modeling assumptions need to be made, in order to reduce the complexity of the learning process. These assumptions will delineate a hypothesis subspace

H \subset F

. The optimization objective of the learning algorithm is, therefore, to find the best predictor

h^{*} \in H

that lies closest to the true predictor

f^{*}

. However, the objective to optimize is typically dependent on the available training data and, hence, only an estimation of the true objective. This is often referred to as minimizing the empirical risk

R_{e m p} (h)

instead of the true risk

R (h)

. Consequently, the predictor induced by the learning algorithm will also only be an estimation of the best possible predictor:

\hat{h} \approx h^{*}

(we direct the interested reader to Figure 4 in [70] for a concise visualization).

Notice that attempting to predict

y_{n + 1}

implies a series of approximations that all get accumulated into the uncertainty of the final model’s outcome

\hat{h} (x_{n + 1}) = {\hat{y}}_{n + 1} \approx y_{n + 1} = f^{*} (x_{n + 1})

. It is at this point that we can concretely illustrate the main types of uncertainty that flow into the model’s prediction:

Aleatoric: Irreducible, due to the non-deterministic nature of the input/output dependency and random noise in the available data. As an example, imagine a self-driving car that relies on various sensors (cameras, LiDAR) to perceive its surroundings and make navigation decisions. Sensor data can be inherently noisy due to factors such as bad weather conditions, sensor limitations, or temporary occlusions. This noise in the input data translates to uncertainty in the car’s perception of the environment and ultimately in its predictions about safe navigation paths.
Epistemic: Reducible through additional information about the perfect predictor $f^{*}$ . It comprises two parts:
(a)
model uncertainty: How close is the hypothesis space with its best model choice $h^{*}$ to the perfect predictor $f^{*}$ ? It is very difficult to describe, often ignored by assuming $f^{*}$ is included in the hypothesis space: $f^{*} \in H \subset F$
(b)
approximation error: How close is the learned predictor $\hat{h}$ to the best $h^{*}$ ? The error vanishes as the size of training data increases indefinitely.

In general, both aleatoric and epistemic uncertainty depend on the interaction between data and prior knowledge (used for specifying the model assumptions). The stronger the prior knowledge, the fewer data are needed to reduce uncertainty. Furthermore, the more restrictive the model assumptions, the smaller the overall uncertainty will be (see Figure 8 in [70]). In other words, more complex/flexible models tend to lead to higher degrees of uncertainty.

3.2. Uncertainty in XAI

Accounting for uncertainty in explanations is decisive for both the end-user (which may possibly be a layman in AI) and the ML expert. For the end-user, explanations lacking uncertainty information can either create a false sense of security and overreliance (leading to misuse of AI) or, on the other side of the spectrum, a sense of mistrust and reluctance (leading to disuse of AI [71], a form of algorithm aversion [52]). An explanation presented with absolute certainty, even if seemingly clear, might not reflect the true capabilities of the model. This can cause the user to blindly trust the AI’s decision, potentially overlooking biases or errors, if the explanations match the user’s expectations of the reality (cognitive bias [59]) or, the other way around, the user may choose to discard explanations completely, if they appear misleading in relation to reality. On the other hand, explanations that acknowledge and quantify their own uncertainty empower the user to make more informed decisions by understanding the limitations of the model’s reasoning. For the ML expert, uncertainty quantification plays a vital role in debugging and improving the AI system. “Unexplained explanations” could mask underlying issues within the model, such as data biases or flaws in the learning process. By identifying explanations with high uncertainty, the expert can pinpoint areas where the model might be less confident or reaching conclusions based on unreliable information. This allows for targeted troubleshooting and refinement of the AI system, ultimately leading to more robust and reliable model construction. Just as it is important to assess the confidence level of a model’s prediction to determine whether a second opinion from a human expert is warranted, it is equally critical to evaluate the uncertainty in the explanations generated for a model’s decision (some approaches aim at training models on special datasets annotated with ground-truth explanations, so that the model predicts the explanation as (part of) its objective [72,73]; in such cases, analysis of the uncertainty of the explanation would intertwine with that of the model’s prediction, as presented in the preceding subsection). Ideally, investigating the uncertainty of explanations should reveal whether the requirements discussed in Section 2.2 are fulfilled.

4. Modeling Uncertainty in XAI

Uncertainty in the explanations can be due to a variety of reasons: (aleatoric) uncertainty in the input data, uncertainty of the ML model, as well as uncertainty of the explanation method itself. Depending on the assumptions of the origin of the uncertainty, different modeling approaches can be derived, as outlined in the next section. Figure 3 sketches the general framework of explaining the prediction of an AI model. An input x is presented to the model/predictor f. The input can be of any data type (image, text, audio, and so forth), but needs to match the data format the model has seen during training. The (fully trained) model f analyzes x and delivers a prediction

f (x)

. Depending on the task at hand, the output can either be a class label in the case of classification, a real number for regression, a set of actions for an agent trained via Reinforcement Learning, and so on. Regardless of its form, it is output

f (x)

that the user needs an explanation for. Let e denote the explainer. In simple terms, e is a functional that either takes the input sample x and the model f as arguments and outputs a local explanation specific to that particular input and its corresponding prediction or the model f itself as an argument and outputs a global explanation for the overall decision-making processes. Note that the explanation is not necessarily derived directly from the prediction

f (x)

. While some (model-agnostic post hoc) XAI methods do not need the model’s parameters to construct the explanation—in the form

e (x, f (x))

—e.g., LIME [34] or Shapley Additive Explanations (SHAP) [74], others do require access to the inner mechanics of f, such as gradient based feature attribution maps or prototypical networks.

In order to analyze and assess the uncertainty of an explanation, the framework in Figure 3 needs to be expanded from a pointwise explanation paradigm into a probabilistic one. The following subsections deal with the three main approaches for achieving this: propagation of uncertainty of the input data (Section 4.1), uncertainty of the prediction model (Section 4.2), or uncertainty of the explainer itself (Section 4.3).

4.1. Approach 1: Perturbed Input (Variation in x)

Figure 4 highlights the first approach for rendering a pointwise explanatory pipeline into a probabilistic one. The input x is perturbed within a small region

ϵ_{x}

around x, which results in n new variants

x^{(i)}, \forall i = 1, \dots, n

. A more precise specification of

ϵ_{x}

may be rather challenging. It may, e.g., reflect the different uncertainties of measurements of input attributes as well as different model and/or explanation sensitivities with respect to different input attributes. For every perturbed input, the model f delivers a prediction

f (x^{(i)})

. Likewise, for every prediction, the explainer computes an explanation

e (f, x^{(i)})

.

The literature abounds in input perturbation methods for investigating robustness and confidence of intelligent systems. For instance, one may mask/occlude certain relevant parts of the input. In the specific case of Computer Vision, this is known as pixel flipping. Several pixel flipping strategies are extensively discussed in [75], one of which we exemplify in the next section. Similar perturbations may be achieved by injecting random noise into the input or—for images only—applying geometric or morphological transformations. Adversarial attacks [76] are another established method of perturbing the input. In this case, the goal is to search for minimal modifications that alter the input such that the model’s prediction is flipped.

Pixel Flipping

Pixel Flipping [77], as the name suggests, is an occlusion strategy originating from and popular in the Computer Vision community. The idea is to mask certain regions (pixels or superpixels) within the input image and measure what effect this occlusion has on the performance of the AI system. The two most important hyperparameters of this method are (1) which regions to mask and (2) what values should replace the selected regions.

Figure 5 shows an example. The image is of a wild bee species named Andrena fulva, scraped from the iNaturalist (https://www.inaturalist.org/ (accessed on 19 May 2024)) dataset. Three standard feature attribution methods were selected: (Vanilla) Gradient-based Saliency Maps [78], the Input × Gradient method [79], and Integrated Gradients [32]. The explanations were computed w.r.t. to a ResNet50 model trained as described in [80]. By iteratively masking 0.5% of the pixels randomly (up until 3/4 of the image), 150 perturbed versions of the original input were created. For each of these versions, a separate heatmap was computed with the three XAI methods, resulting in 150 maps per method. The heatmap pixel values (normalized between 0 and 1) define a discrete distribution, out of which only five quantiles are shown. To flip the pixels, random values from a normal distribution were drawn, defined using the mean pixel values along with their standard deviations computed across the training set of the classifier.

Figure 5. Example of Random Pixel Flipping (see repository (https://github.com/TeodorChiaburu/beexplainable/tree/master/notebooks/XAI_Preliminary (accessed on 5 May 2024))). The first image on the left is a sample from the wild bee subset of the iNaturalist dataset. The class Andrena fulva was correctly predicted by the model. Its pixels are iteratively masked until we reach the final perturbed version shown beneath the original version. For every perturbed version (150 in total), a new saliency map is computed via the three XAI methods named on the right. For each of these methods, five quantiles are displayed from the pixel-wise distributions of the feature maps. Note the high variation in the explanation distribution for all three methods, marked by more and more prominent regions from lower to higher quantiles. A more confident explanation would, in contrast, display prominent relevant regions across all quantiles (compare Figure 6). The colormaps of the quantiles have been normalized from the lowest to the highest occurring pixel value across all saliency maps to ensure comparability between methods and quantiles.

Figure 6. Example of Monte Carlo Dropout for a difficult sample of the iNaturalist dataset (see repository (https://github.com/TeodorChiaburu/beexplainable/tree/master/notebooks/XAI_Preliminary (accessed on 5 May 2024))). The class Dasypoda hirtipes was correctly predicted by the model. Beneath the original image on the left side, the distribution of the 500 Softmax probabilities for the true class is given (the broader the distribution, the more uncertain the classifier). The distribution of the explanations (for all the 500 runs) at pixel level is visualized via five quantile maps. The less variation in certain regions of the quantile maps, the higher the explainer’s certainty that those regions are relevant for recognizing the class. Notice the relatively low variation in the relevant regions across all quantiles for all methods, which suggests highly confident explanations (as opposed to those shown in Figure 5). The colormaps of the quantiles have been normalized from the lowest to the highest occurring pixel value across all saliency maps to ensure comparability between methods and quantiles.

4.2. Approach 2: Probabilistic Predictor (Variation in f)

The second approach is showcased in Figure 7. Multiple predictions are generated for the same constant input x. This can be achieved either by rendering the pointwise predictor f into an (approximate) BNN or into a Conformal Predictor. In both cases, similar to the first approach, the explanations are computed w.r.t. every separate prediction

f^{(i)} (x)

of the predicted discrete distribution.

We note that there are many more standard ML approaches that inherently model the uncertainty of the prediction, which we will not discuss here further. Just to name a few: Gaussian Processes (http://gaussianprocess.org/gpml (accessed on 19 May 2024)) (with their heteroscedastic version [81]), Quantile Regression [82] (with an extension for conformal prediction [83]), or Gaussian Process Probes [84]. For explaining their predictions, one may consider XAI techniques such as PDP or ICE Plots [31].

The list below gives an overview of several established strategies for upgrading a point predictor into a probabilistic one (we note that some authors, e.g., [85], use the term “probabilistic predictor” to describe the usual predictor outputting a point probability for every possible outcome, reserving the term “multi-probabilistic predictor” for models outputting whole probability distributions for every outcome; throughout this paper, we exclusively use the term “pointwise predictor” to denote the former case and “probabilistic predictor” to denote the latter):

Monte Carlo Dropout [86,87] (exemplified below), with various dropout-strategies possible [69];
Deep Ensembles [88], with code repository (https://github.com/Kyushik/Predictive-Uncertainty-Estimation-using-Deep-Ensemble (accessed on 19 May 2024)) and [89];
Bayes by Backprop [90];
Discriminative Jackknife (via influence functions) [91];
Laplace Approximation (https://bookdown.org/rdpeng/advstatcomp/laplace-approximation.html (accessed on 19 May 2024));
Practical Variational Inference [92,93] (pp. 461–464);
Probabilistic Backpropagation [94], with code repository (https://ymd_h.gitlab.io/b4tf/algorithms/pbp (accessed on 19 May 2024));
Stochastic Expectation Propagation [95];
Calibrated Explanations [85], with code repository (https://github.com/Moffran/calibrated_explanations (accessed on 19 May 2024)) and exemplified below.

4.2.1. BNN: Monte Carlo Dropout

A popular and computationally efficient technique for transforming a point-wise predictor into a Bayesian one is Monte Carlo Dropout, proven in [87] to be a solid approximation for Bayesian inference. The core principle lies in enabling the Dropout layer during the inference stage. By executing multiple predictions for the same input sample with Dropout activated, the enhanced model generates a distribution of probabilities for each potential class. The authors in [86] leveraged this concept to generate a distribution of explanations for individual image samples classified by a standard computer vision classifier. They argued that a lower level of variance observed within the feature attribution maps across various quantiles signifies a reduced degree of uncertainty associated with the explanation (see Figure 6).

4.2.2. Conformal Predictor: Calibrated Explanations

The authors in [85] proposed an XAI method compatible with the concept of conformal prediction. The idea behind it is to extend a pointwise predictor (classifier or regressor) into a probabilistic predictor via the Venn–Abers calibration method [96]. The benefit is that the scores for the features’ relevance are accompanied by a confidence interval (see example from the wine dataset from OpenML (https://www.openml.org/search?type=data&sort=runs&id=287&status=active (accessed on 19 May 2024)) in Figure 8).

The visualization technique is inspired by LIME [34]. The underlying model for the class prediction used in Figure 8 is a Random Forest Classifier from sklearn (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html (accessed on 19 May 2024)), while the local explanations themselves are computed via Decision Trees. The top two bars in the plot inform about the prediction of the calibrated model. The solid red line indicates a calibrated probability estimate of about 0.55 for the positive class (meaning that the wine is of high quality). Similarly, the solid blue line gives an estimate of roughly 0.45 for the negative class. The light red and light blue areas mark the uncertainty of the class prediction; the narrower the lightly colored area, the lower the prediction uncertainty. The main plot deals with the feature weights (relevance scores). Each feature is shown on the left side along with a conditional rule for defining that weight. The values taken by the features for the explained instance are given on the right side. The bars in between show the feature weights for every rule, starting from the solid black line at 0, which corresponds to the calibrated probability estimate for the positive class. There is only one feature that contributes towards the negative class: "sulphates", highlighted by the blue bar. All the other features contribute towards the positive class, with "volatile acidity" weighing the most in this respect. The light grey area depicts the same probability interval for the prediction as given by the light red and blue ares in the top two bars. As for the features, their weights are solidly coloured from the zero line to the lower bound of the weight interval and lightly coloured between the lower and upper bound of that interval. Again, a narrower lightly shaded area in the feature weights means a lower uncertainty.

4.3. Approach 3: Stochastic Explainers (Variation in e)

Incorporation of uncertainty into the explainer leads to a third approach as depicted in Figure 9. For a fixed input x and a non-probabilistic predictor f, multiple explanations are generated by a stochastic explainer. We emphasize that, for intrinsically interpretable models, e.g., ProtoPNets [97], there is no explicit distinction of the model f and the explainer e, i.e., in this case, Approach 2 and Approach 3 coincide. A (not exhaustive) list of currently available stochastic XAI methods is given in Table 1 and illustrated in the following subsections.

4.3.1. CXPlain

Causal Explanations for Model Interpretation under Uncertainty (CXPlain) [98] presents a methodology that involves training a parallel explainer model to predict the explanations for the decisions of an original model. This explainer can be any ML model, although the authors advocate for the use of a UNet [102] architecture as a strong default choice. The original model does not require re-training for CXPlain to function. For each input in the training set, the relevance scores of the features are calculated by masking individual features and evaluating the prediction error of the modified input. These generated explanations serve as ground truth for training the UNet, akin to the approach employed in semantic segmentation tasks where the true labels comprise scores for masked regions. To address uncertainty, multiple UNets are trained on the (input, explanation) pairs through the technique of bootstrap ensembles [103]. During inference, the distribution of explanations estimated by this ensemble provides a measure of uncertainty in the feature importance estimates for each sample in the test set. See Figure 10 for two examples.

4.3.2. BayesLIME

BayesLIME [99] and its modification BayLIME [100] are an extension of the popular LIME [34] method. Similar to LIME, it generates local explanations for individual predictions by fitting a simple linear model that approximates the behavior of the original black-box model around a specific instance.

In addition, BayesLIME leverages the power of Bayesian statistics to estimate a confidence interval around the feature importances. Furthermore, similar to LIME, it creates synthetic data points near the instance of interest by perturbing its features. However, instead of using fixed weights for each feature, BayesLIME assigns random weights according to a pre-defined probability distribution. This allows for more diverse perturbations and captures the uncertainty in the data. Figure 11 gives an example from the BayesLIME authors themselves on the COMPAS (https://github.com/propublica/compas-analysis (accessed on 19 May 2024)) dataset [99] and Figure 12 shows an example for image data.

4.3.3. TCAV

Testing with Concept Activation Vectors (TCAV) [33] (and its successor that automatically searches for concepts, Automated Concept-based Explanations (ACE) [101]) fits a linear classifier within a designated layer of the model. This classifier is trained to differentiate between exemplars of a given concept and randomly generated counterexamples. Consequently, the Concept Activation Vectors (CAVs) are constructed as vectors orthogonal to the decision boundaries established by the linear classifier. To quantify the model’s class prediction sensitivity towards each concept, TCAV employs directional derivatives. These derivatives are computed iteratively for the class members with respect to the CAVs, utilizing multiple partitions of the randomly generated dataset, resulting in an average TCAV score for this class member together with a distribution of scores representing the uncertainty of this concept (explanation). The TCAV score for an entire class relative to a specific concept is then defined as the proportion of class members exhibiting positive directional derivatives across all the CAVs associated with that concept. When evaluating a single sample, the score only considers positive derivatives with respect to the concept’s CAVs, without incorporating a class-wise perspective (see Figure 13).

4.3.4. CoProNN

Concept-based Prototypical Nearest Neighbors (CoProNN) (https://github.com/TeodorChiaburu/beexplainable (accessed on 19 May 2024)) generate explanations by computing similarities between the feature vectors of input images in the latent space of a network’s backbone and feature vectors of concept images. The concept dataset can be a pre-annotated one, e.g., broden [36], but the authors advocate for formulating higher level concept images by leveraging text-to-image generative models such as Stable Diffusion [107]. Along with the concept set, a random dataset partitioned into multiple subsets is transformed through the predictor’s backbone. The CoProNN-algorithm gathers the k nearest neighbors from the concept set and the random set in multiple runs (for every random partition) and delivers the averaged number of neighbors for every concept as explanations. Figure 14 shows two examples from the ImageNet [108] dataset computed for the predictions of a pretrained ResNet50 [109].

5. Human Perception and Uncertainty in XAI

The core assumption in XAI research is that explanations foster trust. The definition of trust is inherently tied to the notion of uncertainty: trusting an agent implies relying on their advice in decisions under uncertainty [110,111]. To what extent a decision is made under uncertainty depends on a variety of factors including the task itself, the volume and quality of information provided and human perceptual capacity, including personality traits [28]. In the following sections, we discuss empirical evidence related to these factors reported in the context of XAI.

5.1. Neural and Cognitive Aspects of Uncertainty

Research around human perception of uncertainty in decision making has explored the topic on different levels of abstraction. There is a substantial body of literature on the neural basis of uncertainty and how it is represented in the physiological processes underlying our cognitive processing [112,113]. Furthermore, psychologists have investigated models to conceptualize the relevant aspects involved in human perception, for instance by differentiating notions of confidence and certainty [114]. In the context of human–AI assistance, studies on the impact of uncertainty are underrepresented and often difficult to compare. This could also be due to the fact that uncertainty models in XAI (see Section 4) are only beginning to emerge. However, the growing body of literature studying the effects of fragility of explanations as well as the effect of various aspects of model transparency also allows to draw some conclusions on the impact of uncertainty of explanations.

5.2. Uncertainty via Explanation Fragility

There is ample evidence that unreliable explanations can erode user trust in the outcomes and raise doubts about the model’s reliability. While these effects are not directly related to uncertainty, they are closely related to the requirements of fidelity, stability, and robustness outlined in Section 2.2. When the inputs are subject to changes that are barely noticeable to the human observer, but have severe impact on the explanations, these explanations might appear unreliable. These effects related to the fragility of conventional XAI methods in the presence of perturbations have been investigated in various studies with and without humans in the loop.

For instance, some researchers argue that methods such as LIME introduce a certain degree of uncertainty due to factors such as randomness in the sampling procedure or variation in the sampling proximity [115]. Other researchers show that XAI methods often fail to exhibit invariance across constant shifts in the input [62]. Likewise, other studies report sensitivity of saliency methods to perceptually irrelevant perturbations in the explained image [116]. Even though the perturbed image is indistinguishable from the original image to the human eye and the model’s prediction remains unchanged, the two explanations produced differ significantly. These effects appear counterintuitive: good explanations should be invariant to constant shifts or offsets in the input.

While the above studies show that XAI methods appear to be too sensitive to perceptually irrelevant changes in the inputs, there is also evidence for the opposite effect. Some explanation methods are invariant with respect to changes in the data-generating process and, hence, lack fidelity. If XAI methods are insensitive to such changes, they will appear unreliable [63].

5.3. Effects of Communicating Uncertainty

Next to these indirect effects of uncertainty in XAI through fragility or insensitivity of explanations, communicating uncertainty has also direct effects on human observers. Even when using point estimates of explanations, uncertainty can impact human subjects. Communicating the accuracy of a model on held-out datasets to human observers in an AI-assisted annotation task can be considered a global proxy to uncertainty of individual predictions. Researchers find that communicating the predictive performance of a model is positively correlated with the trust into a model [117]. Importantly, the results of [117] also show that if the predictive performance of a model does not match the observed predictive performance, humans will adapt their behaviour and increase or decrease their trust accordingly. Such results highlight the importance of communicating uncertainty—human observers will take uncertainty into account. However, at the same time these results raise the question: would more direct communication of uncertainty, for instance uncertainty for individual predictions rather than just global metrics like predictive performance, enable humans to make even better decisions in human–AI collaboration?

These effects of communicating uncertainty of a model directly are unfortunately less clear than one might hope for. When providing the confidence (or uncertainty) of individual predictions, explanations for these predictions have been reported to lead to decreased trust in human–AI collaboration [48]. Somewhat in contrast to these negative effects, other studies report that masking pixels that associated with low uncertainty leads to higher error rates in digit recognition tasks [99]. These effects suggest that low uncertainty is a sensible objective for feature relevance and local explanations of model predictions. However, on the other hand, these experimental paradigms as used in [99] are somewhat academic in that usefulness of an explanation is purposefully degraded by masking pixels—these types of experiments can, thus, not provide direct evidence on the usefulness of explanations and communicating uncertainty thereof.

6. Discussion

As detailed in Section 4, the current approaches for modeling uncertainty in XAI can be broadly categorized into three groups: input variation, model prediction variation, and explainer variation. Perturbing the input allows for analyzing how sensitive explanations are to small changes in the data, offering insights into how aleatoric uncertainty may propagate down to the explanations. Similarly, varying model predictions measures the impact of epistemic uncertainty arising from model limitations onto the explanations. Thirdly, analyzing the distributions of explanations generated by stochastic explainers allows for assessing the explainability method’s own contribution to uncertainty.

While valuable insights can be drawn from each approach, limitations exist. Distinguishing between aleatoric and epistemic uncertainty remains a challenge. Additionally, for some intrinsically interpretable models, such as prototypical networks, it can be difficult to communicate uncertainty inherent to their predictions. This raises the question: would a combined approach, considering uncertainty in all three elements (input, model, explainer), be desirable, even necessary, for a more comprehensive analysis of explanation uncertainty? However, such a combined approach would need to address the computational complexity and potential redundancy that might arise from simultaneously exploring all sources of variation.

While these approaches deal with the abstract modeling of uncertainty in explanations, an important pragmatic question remains: how well does this modeled uncertainty align with human perception? Do these techniques effectively capture the nuances of uncertainty that humans intuitively understand and utilize in decision making? A potential gap may exist between the technical quantification of uncertainty and a user’s ability to interpret and integrate this information. To bridge this gap, XAI methods can be further improved by incorporating insights from human perception research. User studies and psychological experiments can shed light on how humans best understand and utilize uncertainty cues. This knowledge can then be used to design XAI systems that present uncertainty information in a way that is more aligned with human intuition, ultimately leading to better trust calibration and informed decision making alongside AI recommendations.

To some extent, the discussion about uncertainty in XAI also delves into the realm of ethics. Presenting explanations with high levels of uncertainty, particularly in high-stakes decision-making scenarios, necessitates careful consideration. On the one hand, complete transparency regarding limitations is crucial to avoid users from developing misplaced trust in the AI’s recommendations. However, overwhelming users with overly complex uncertainty information could lead to confusion or so-called “paralysis by analysis”. Finding the right balance is key as there is evidence that cognitive friction due to overly complex explanations can reduce trust or usefulness of explanations [51]. XAI systems should strive to present uncertainty in a way that is clear, actionable and tailored to the specific decision-making context.

7. Conclusions

This review paper has explored the role of uncertainty in XAI. We discussed the importance of XAI and uncertainty in building trust, transparency and responsible AI. We reviewed existing approaches for modeling uncertainty in explanations, highlighting their strengths and limitations. We emphasized the role of human perception in this context.

Several promising directions appear worth exploring in future research. Can emerging techniques from other fields, such as probabilistic programming, be leveraged to enhance uncertainty quantification in XAI? Additionally, the development of standardized metrics and benchmarks for evaluating the effectiveness of uncertainty modeling approaches is much needed for advancing the field.

By continuing research in these areas, we can ensure that XAI systems not only provide explanations but also effectively communicate the inherent uncertainties associated with those explanations. This will hopefully lead to a future where humans and AI can work together more effectively and responsibly.

Author Contributions

Conceptualization, T.C., F.H. and F.B.; Methodology, T.C., F.H. and F.B.; Software, T.C.; Validation, T.C., F.H. and F.B.; Formal analysis, T.C., F.H. and F.B.; Investigation, T.C., F.H. and F.B.; Resources, T.C., F.H. and F.B.; Data curation, T.C., F.H. and F.B.; Writing—original draft preparation, T.C., F.H. and F.B.; Writing—review and editing, T.C., F.H. and F.B.; Visualization, T.C.; Supervision, F.H. and F.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets and code repository are available with the corresponding author and can be provided on a reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ACE	Automated Concept-based Explanations
BNN	Bayesian Neural Network
CNN	Convolutional Neural Network
CoProNN	Concept-based Prototypical Nearest Neighbors
CXPlain	Causal Explanations for Model Interpretation under Uncertainty
DNN	Deep Neural Network
HIL	Human-in-the-Loop
ICE	Individual Conditional Expectation
LIME	Local Interpretable Model-agnostic Explanations
ML	Machine Learning
PDP	Partial Dependence Plots
SHAP	Shapley Additive Explanations
TCAV	Testing with Concept Activation Vectors
XAI	Explainable Artifical Intelligence

References

Angelopoulos, A.N.; Bates, S. A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification. arXiv 2022, arXiv:2107.07511. [Google Scholar]
Goan, E.; Fookes, C. Bayesian Neural Networks: An Introduction and Survey. In Lecture Notes in Mathematics; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 45–87. [Google Scholar] [CrossRef]
Liu, B. In AI We Trust? Effects of Agency Locus and Transparency on Uncertainty Reduction in Human–AI Interaction. J. Comput. Mediat. Commun. 2021, 26, 384–402. [Google Scholar] [CrossRef]
Phillips, R.L.; Chang, K.H.; Friedler, S.A. Interpretable active learning. arXiv 2017, arXiv:1708.00049. [Google Scholar]
Antorán, J.; Bhatt, U.; Adel, T.; Weller, A.; Hernández-Lobato, J.M. Getting a CLUE: A Method for Explaining Uncertainty Estimates. arXiv 2021, arXiv:2006.06848. [Google Scholar]
Mougan, C.; Nielsen, D.S. Monitoring Model Deterioration with Explainable Uncertainty Estimation via Non-parametric Bootstrap. arXiv 2022, arXiv:2201.11676. [Google Scholar] [CrossRef]
Brown, K.E.; Talbert, D.A. Using Explainable AI to Measure Feature Contribution to Uncertainty. Int. FLAIRS Conf. Proc. 2022. [Google Scholar] [CrossRef]
Goodman, B.; Flaxman, S. European Union Regulations on Algorithmic Decision Making and a “Right to Explanation”. AI Mag. 2017, 38, 50–57. [Google Scholar] [CrossRef]
Guidotti, R.; Monreale, A.; Ruggieri, S.; Turini, F.; Pedreschi, D.; Giannotti, F. A Survey Of Methods For Explaining Black Box Models. arXiv 2018, arXiv:1802.01933. [Google Scholar] [CrossRef]
Tomsett, R.; Braines, D.; Harborne, D.; Preece, A.; Chakraborty, S. Interpretable to Whom? A Role-based Model for Analyzing Interpretable Machine Learning Systems. arXiv 2018, arXiv:1806.07552. [Google Scholar]
Schwalbe, G.; Finzel, B. A comprehensive taxonomy for explainable artificial intelligence: A systematic survey of surveys on methods and concepts. Data Min. Knowl. Disc. 2023. [Google Scholar] [CrossRef]
Ali, S.; Abuhmed, T.; El-Sappagh, S.; Muhammad, K.; Alonso-Moral, J.M.; Confalonieri, R.; Guidotti, R.; Del Ser, J.; Díaz-Rodríguez, N.; Herrera, F. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Inf. Fusion 2023, 99, 101805. [Google Scholar] [CrossRef]
Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy 2021, 23, 18. [Google Scholar] [CrossRef] [PubMed]
Véliz, C.; Prunkl, C.; Phillips-Brown, M.; Lechterman, T.M. We might be afraid of black-box algorithms. J. Med. Ethics 2021, 47, 339–340. [Google Scholar] [CrossRef]
Hedström, A.; Weber, L.; Krakowczyk, D.; Bareeva, D.; Motzkus, F.; Samek, W.; Lapuschkin, S.; Höhne, M.M.M. Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond. J. Mach. Learn. Res. 2023, 24, 1–11. [Google Scholar]
Murdoch, W.J.; Singh, C.; Kumbier, K.; Abbasi-Asl, R.; Yu, B. Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. USA 2019, 116, 22071–22080. [Google Scholar] [CrossRef] [PubMed]
Carvalho, D.V.; Pereira, E.M.; Cardoso, J.S. Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics 2019, 8, 832. [Google Scholar] [CrossRef]
Arrieta, A.B.; Díaz-Rodríguez, N.; Ser, J.D.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. arXiv 2019, arXiv:1910.10045. [Google Scholar]
Doshi-Velez, F.; Kim, B. Towards A Rigorous Science of Interpretable Machine Learning. arXiv 2017, arXiv:1702.08608. [Google Scholar]
Das, A.; Rad, P. Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey. arXiv 2020, arXiv:2006.11371. [Google Scholar]
Chromik, M.; Schuessler, M. A Taxonomy for Human Subject Evaluation of Black-Box Explanations in XAI. In Proceedings of the ExSS-ATEC@IUI, Cagliari, Italy, 17–20 March 2020. [Google Scholar]
Hoffman, R.R.; Mueller, S.T.; Klein, G.; Litman, J. Metrics for Explainable AI: Challenges and Prospects. arXiv 2019, arXiv:1812.04608. [Google Scholar]
Schmidt, P.; Biessmann, F. Quantifying interpretability and trust in machine learning systems. In Proceedings of the AAAI-19 Workshop on Network Interpretability for Deep Learning, Honolulu, HI, USA, 27 January–2 February 2019. [Google Scholar]
Adadi, A.; Berrada, M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
Wang, D.; Yang, Q.; Ashraf, A.; Lim, B. Designing Theory-Driven User-Centric Explainable AI. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow Scotland, UK, 4–9 May 2019. [Google Scholar] [CrossRef]
Biessmann, F.; Refiano, D. Quality metrics for transparent machine learning with and without humans in the loop are not correlated. In Proceedings of the ICML Workshop on Theoretical Foundations, Criticism, and Application Trends of Explainable AI Held in Conjunction with the 38th International Conference on Machine Learning (ICML), Vienna, Austria, 18 July 2020. [Google Scholar]
Hamm, C.A.; Baumgärtner, G.L.; Biessmann, F.; Beetz, N.L.; Hartenstein, A.; Savic, L.J.; Froböse, K.; Dräger, F.; Schallenberg, S.; Rudolph, M.; et al. Interactive Explainable Deep Learning Model Informs Prostate Cancer Diagnosis at MRI. Radiology 2023, 307, e222276. [Google Scholar] [CrossRef] [PubMed]
Schmidt, P.; Biessmann, F. Calibrating human-AI collaboration: Impact of risk, ambiguity and transparency on algorithmic bias. In Proceedings of the 2020 Cross Domain Conference for Machine Learning and Knowledge Extraction, Dublin, Ireland, 25–28 August 2020. [Google Scholar]
Nauta, M.; Trienes, J.; Pathak, S.; Nguyen, E.; Peters, M.; Schmitt, Y.; Schlötterer, J.; van Keulen, M.; Seifert, C. From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI. ACM Comput. Surv. 2023, 55, 1–42. [Google Scholar] [CrossRef]
Patrício, C.; Neves, J.C.; Teixeira, L.F. Explainable Deep Learning Methods in Medical Image Classification: A Survey. arXiv 2023, arXiv:2205.04766. [Google Scholar] [CrossRef]
Molnar, C. Interpretable Machine Learning. 2022. Available online: https://christophm.github.io/interpretable-ml-book (accessed on 19 May 2024).
Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic Attribution for Deep Networks. arXiv 2017, arXiv:1703.01365. [Google Scholar]
Kim, B.; Wattenberg, M.; Gilmer, J.; Cai, C.; Wexler, J.; Viegas, F.; Sayres, R. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). arXiv 2018, arXiv:1711.11279. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You”? Explaining the Predictions of Any Classifier. arXiv 2016, arXiv:1602.04938. [Google Scholar]
Koh, P.W.; Nguyen, T.; Tang, Y.S.; Mussmann, S.; Pierson, E.; Kim, B.; Liang, P. Concept Bottleneck Models. arXiv 2020, arXiv:2007.04612. [Google Scholar]
Bau, D.; Zhou, B.; Khosla, A.; Oliva, A.; Torralba, A. Network Dissection: Quantifying Interpretability of Deep Visual Representations. arXiv 2017, arXiv:1704.05796. [Google Scholar]
Yeh, C.K.; Kim, J.S.; Yen, I.E.H.; Ravikumar, P. Representer Point Selection for Explaining Deep Neural Networks. arXiv 2018, arXiv:1811.09720. [Google Scholar]
Sammani, F.; Mukherjee, T.; Deligiannis, N. NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks. arXiv 2022, arXiv:2203.05081. [Google Scholar]
Herm, L.V.; Heinrich, K.; Wanner, J.; Janiesch, C. Stop ordering machine learning algorithms by their explainability! A user-centered investigation of performance and explainability. Int. J. Inf. Manag. 2023, 69, 102538. [Google Scholar] [CrossRef]
Rong, Y.; Leemann, T.; Nguyen, T.T.; Fiedler, L.; Qian, P.; Unhelkar, V.; Seidel, T.; Kasneci, G.; Kasneci, E. Towards Human-Centered Explainable AI: A Survey of User Studies for Model Explanations. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 2104–2122. [Google Scholar] [CrossRef]
Hamm, P.; Klesel, M.; Coberger, P.; Wittmann, H.F. Explanation matters: An experimental study on explainable AI. Electron. Mark. 2023, 33, 17. [Google Scholar] [CrossRef]
Leichtmann, B.; Humer, C.; Hinterreiter, A.; Streit, M.; Mara, M. Effects of Explainable Artificial Intelligence on trust and human behavior in a high-risk decision task. Comput. Hum. Behav. 2023, 139, 107539. [Google Scholar] [CrossRef]
Meske, C.; Bunde, E. Design Principles for User Interfaces in AI-Based Decision Support Systems: The Case of Explainable Hate Speech Detection. Inf. Syst. Front. 2022, 25, 743–773. [Google Scholar] [CrossRef]
Shafti, A.; Derks, V.; Kay, H.; Faisal, A.A. The Response Shift Paradigm to Quantify Human Trust in AI Recommendations. arXiv 2022, arXiv:2202.08979. [Google Scholar]
Druce, J.; Harradon, M.; Tittle, J. Explainable Artificial Intelligence (XAI) for Increasing User Trust in Deep Reinforcement Learning Driven Autonomous Systems. arXiv 2021, arXiv:2106.03775. [Google Scholar]
van der Waa, J.; Nieuwburg, E.; Cremers, A.; Neerincx, M. Evaluating XAI: A comparison of rule-based and example-based explanations. Artif. Intell. 2021, 291, 103404. [Google Scholar] [CrossRef]
Weitz, K.; Schiller, D.; Schlagowski, R.; Huber, T.; Andre, E. “Let me explain!”: Exploring the potential of virtual agents in explainable AI interaction design. J. Multimodal User Interfaces 2020, 15, 87–98. [Google Scholar] [CrossRef]
Schmidt, P.; Biessmann, F.; Teubner, T. Transparency and trust in artificial intelligence systems. J. Decis. Syst. 2020, 29, 260–278. [Google Scholar] [CrossRef]
Alufaisan, Y.; Marusich, L.R.; Bakdash, J.Z.; Zhou, Y.; Kantarcioglu, M. Does Explainable Artificial Intelligence Improve Human Decision-Making? arXiv 2020, arXiv:2006.11194. [Google Scholar] [CrossRef]
David, D.B.; Resheff, Y.S.; Tron, T. Explainable AI and Adoption of Financial Algorithmic Advisors: An Experimental Study. arXiv 2021, arXiv:2101.02555. [Google Scholar]
Poursabzi-Sangdeh, F.; Goldstein, D.G.; Hofman, J.M.; Vaughan, J.W.; Wallach, H. Manipulating and Measuring Model Interpretability. arXiv 2021, arXiv:1802.07810. [Google Scholar]
Dietvorst, B.J.; Simmons, J.P.; Massey, C. Algorithm aversion: People erroneously avoid algorithms after seeing them err. J. Exp. Psychol. Gen. 2015, 144, 114–126. [Google Scholar] [CrossRef]
Chen, V.; Liao, Q.V.; Vaughan, J.W.; Bansal, G. Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with Explanations. arXiv 2023, arXiv:2301.07255. [Google Scholar] [CrossRef]
Ma, J.; Lai, V.; Zhang, Y.; Chen, C.; Hamilton, P.; Ljubenkov, D.; Lakkaraju, H.; Tan, C. OpenHEXAI: An Open-Source Framework for Human-Centered Evaluation of Explainable Machine Learning. arXiv 2024, arXiv:2403.05565. [Google Scholar]
Alangari, N.; El Bachir Menai, M.; Mathkour, H.; Almosallam, I. Exploring Evaluation Methods for Interpretable Machine Learning: A Survey. Information 2023, 14, 469. [Google Scholar] [CrossRef]
Schuff, H.; Adel, H.; Qi, P.; Vu, N.T. Challenges in Explanation Quality Evaluation. arXiv 2022, arXiv:2210.07126. [Google Scholar]
Mohseni, S.; Zarei, N.; Ragan, E.D. A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems. arXiv 2020, arXiv:1811.11839. [Google Scholar] [CrossRef]
Cugny, R.; Aligon, J.; Chevalier, M.; Roman Jimenez, G.; Teste, O. AutoXAI: A Framework to Automatically Select the Most Adapted XAI Solution. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, New York, NY, USA, 17–21 October 2022; CIKM ’22; pp. 315–324. [Google Scholar] [CrossRef]
Herman, B. The Promise and Peril of Human Evaluation for Model Interpretability. arXiv 2019, arXiv:1711.07414v2. [Google Scholar]
Kim, S.S.Y.; Meister, N.; Ramaswamy, V.V.; Fong, R.; Russakovsky, O. HIVE: Evaluating the Human Interpretability of Visual Explanations. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
Colin, J.; Fel, T.; Cadène, R.; Serre, T. What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods. Adv. Neural Inf. Process. Syst. 2022, 35, 2832–2845. [Google Scholar] [PubMed]
Kindermans, P.J.; Hooker, S.; Adebayo, J.; Alber, M.; Schütt, K.T.; Dähne, S.; Erhan, D.; Kim, B. The (Un)reliability of saliency methods. arXiv 2017, arXiv:1711.00867. [Google Scholar]
Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; Kim, B. Sanity Checks for Saliency Maps. arXiv 2020, arXiv:1810.03292. [Google Scholar]
Mei, A.; Saxon, M.; Chang, S.; Lipton, Z.C.; Wang, W.Y. Users are the North Star for AI Transparency. arXiv 2023, arXiv:2303.05500. [Google Scholar]
Leavitt, M.L.; Morcos, A. Towards falsifiable interpretability research. arXiv 2020, arXiv:2010.12016. [Google Scholar]
Jacovi, A.; Goldberg, Y. Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020.
Gilpin, L.H.; Bau, D.; Yuan, B.Z.; Bajwa, A.; Specter, M.; Kagal, L. Explaining Explanations: An Overview of Interpretability of Machine Learning. arXiv 2019, arXiv:1806.00069. [Google Scholar]
Bhatt, U.; Antorán, J.; Zhang, Y.; Liao, Q.V.; Sattigeri, P.; Fogliato, R.; Melançon, G.G.; Krishnan, R.; Stanley, J.; Tickoo, O.; et al. Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty. arXiv 2021, arXiv:2011.07586. [Google Scholar]
Abdar, M.; Pourpanah, F.; Hussain, S.; Rezazadegan, D.; Liu, L.; Ghavamzadeh, M.; Fieguth, P.; Cao, X.; Khosravi, A.; Acharya, U.R.; et al. A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges. Inf. Fusion 2021, 76, 243–297. [Google Scholar] [CrossRef]
Hüllermeier, E.; Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Mach. Learn. 2021, 110, 457–506. [Google Scholar] [CrossRef]
Löfström, H. On the Definition of Appropriate Trust and the Tools that Come with it. arXiv 2023, arXiv:2309.11937. [Google Scholar]
Hase, P.; Bansal, M. When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data. arXiv 2021, arXiv:2102.02201. [Google Scholar]
Krishna, R.; Zhu, Y.; Groth, O.; Johnson, J.; Hata, K.; Kravitz, J.; Chen, S.; Kalantidis, Y.; Li, L.J.; Shamma, D.A.; et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. arXiv 2016, arXiv:1602.07332. [Google Scholar] [CrossRef]
Lundberg, S.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
Blücher, S.; Vielhaben, J.; Strodthoff, N. Decoupling Pixel Flipping and Occlusion Strategy for Consistent XAI Benchmarks. arXiv 2024, arXiv:2401.06654. [Google Scholar]
Xu, H.; Ma, Y.; Liu, H.; Deb, D.; Liu, H.; Tang, J.; Jain, A.K. Adversarial Attacks and Defenses in Images, Graphs and Text: A Review. arXiv 2019, arXiv:1909.08072. [Google Scholar] [CrossRef]
Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller, K.R.; Samek, W. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLoS ONE 2015, 10, 0130140. [Google Scholar] [CrossRef]
Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv 2014, arXiv:1312.6034. [Google Scholar]
Shrikumar, A.; Greenside, P.; Kundaje, A. Learning Important Features Through Propagating Activation Differences. arXiv 2019, arXiv:1704.02685. [Google Scholar]
Chiaburu, T.; Biessmann, F.; Hausser, F. Towards ML Methods for Biodiversity: A Novel Wild Bee Dataset and Evaluations of XAI Methods for ML-Assisted Rare Species Annotations. arXiv 2022, arXiv:2206.07497. [Google Scholar]
Griffiths, R.R.; A Aldrick, A.; Garcia-Ortegon, M.; Lalchand, V.; Lee, A.A. Achieving robustness to aleatoric uncertainty with heteroscedastic Bayesian optimisation. Mach. Learn. Sci. Technol. 2021, 3, 015004. [Google Scholar] [CrossRef]
Koenker, R.; Hallock, K.F. Quantile Regression. J. Econ. Perspect. 2001, 15, 143–156. [Google Scholar] [CrossRef]
Romano, Y.; Patterson, E.; Candès, E.J. Conformalized Quantile Regression. arXiv 2019, arXiv:1905.03222. [Google Scholar]
Wang, Z.; Ku, A.; Baldridge, J.; Griffiths, T.L.; Kim, B. Gaussian Process Probes (GPP) for Uncertainty-Aware Probing. arXiv 2023, arXiv:2305.18213. [Google Scholar]
Lofstrom, H.; Lofstrom, T.; Johansson, U.; Sonstrod, C. Calibrated Explanations: With Uncertainty Information and Counterfactuals. arXiv 2023, arXiv:2305.02305. [Google Scholar] [CrossRef]
Bykov, K.; Höhne, M.M.C.; Müller, K.R.; Nakajima, S.; Kloft, M. How Much Can I Trust You?—Quantifying Uncertainties in Explaining Neural Networks. arXiv 2020, arXiv:2006.09000. [Google Scholar]
Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; Balcan, M.F., Weinberger, K.Q., Eds.; Volume 48, Proceedings of Machine Learning Research. pp. 1050–1059. [Google Scholar]
Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. arXiv 2017, arXiv:1612.01474. [Google Scholar]
Yang, C.I.; Li, Y.P. Explainable uncertainty quantifications for deep learning-based molecular property prediction. J. Cheminform. 2023, 15, 13. [Google Scholar] [CrossRef]
Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight Uncertainty in Neural Networks. arXiv 2015, arXiv:1505.05424. [Google Scholar]
Alaa, A.M.; van der Schaar, M. Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions. arXiv 2020, arXiv:2007.13481. [Google Scholar]
Graves, A. Practical Variational Inference for Neural Networks. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Nice, France, 2011; Volome 24. [Google Scholar]
Bishop, C.M. Pattern Recognition and Machine Learning (Information Science and Statistics); Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Hernández-Lobato, J.M.; Adams, R.P. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks. arXiv 2015, arXiv:1502.05336. [Google Scholar]
Li, Y.; Hernandez-Lobato, J.M.; Turner, R.E. Stochastic Expectation Propagation. arXiv 2015, arXiv:1506.04132. [Google Scholar]
Vovk, V.; Petej, I. Venn-Abers predictors. arXiv 2014, arXiv:1211.0025. [Google Scholar]
Chen, C.; Li, O.; Tao, C.; Barnett, A.J.; Su, J.; Rudin, C. This Looks Like That: Deep Learning for Interpretable Image Recognition. arXiv 2019, arXiv:1806.10574. [Google Scholar]
Schwab, P.; Karlen, W. CXPlain: Causal Explanations for Model Interpretation under Uncertainty. arXiv 2019, arXiv:1910.12336. [Google Scholar]
Slack, D.; Hilgard, S.; Singh, S.; Lakkaraju, H. Reliable Post hoc Explanations: Modeling Uncertainty in Explainability. arXiv 2021, arXiv:2008.05030. [Google Scholar]
Zhao, X.; Huang, W.; Huang, X.; Robu, V.; Flynn, D. BayLIME: Bayesian Local Interpretable Model-Agnostic Explanations. arXiv 2021, arXiv:2012.03058. [Google Scholar]
Ghorbani, A.; Wexler, J.; Zou, J.; Kim, B. Towards Automatic Concept-based Explanations. arXiv 2019, arXiv:1902.03129. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
Efron, B. The Jackknife, the Bootstrap and Other Resampling Plans; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1982. [Google Scholar] [CrossRef]
Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 2012, 29, 141–142. [Google Scholar] [CrossRef]
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2012. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv 2021, arXiv:2112.10752v2. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the CVPR09, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
Jacovi, A.; Marasović, A.; Miller, T.; Goldberg, Y. Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual, 3–10 March 2021. [Google Scholar] [CrossRef]
Lee, J.D.; See, K.A. Trust in Automation: Designing for Appropriate Reliance. Hum. Factors 2004, 46, 50–80. [Google Scholar] [CrossRef] [PubMed]
Kepecs, A.; Mainen, Z.F. A computational framework for the study of confidence in humans and animals. Philos. Trans. R. Soc. B Biol. Sci. 2012, 367, 1322–1337. [Google Scholar] [CrossRef] [PubMed]
Walker, E.Y.; Pohl, S.; Denison, R.N.; Barack, D.L.; Lee, J.; Block, N.; Ma, W.J.; Meyniel, F. Studying the neural representations of uncertainty. Nat. Neurosci. 2023, 26, 1857–1867. [Google Scholar] [CrossRef]
Pouget, A.; Drugowitsch, J.; Kepecs, A. Confidence and certainty: Distinct probabilistic quantities for different goals. Nat. Neurosci. 2016, 19, 366–374. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Song, K.; Sun, Y.; Tan, S.; Udell, M. "Why Should You Trust My Explanation”? Understanding Uncertainty in LIME Explanations. arXiv 2019, arXiv:1904.12991. [Google Scholar]
Ghorbani, A.; Abid, A.; Zou, J. Interpretation of Neural Networks is Fragile. arXiv 2018, arXiv:1710.10547. [Google Scholar] [CrossRef]
Yin, M.; Vaughan, J.W.; Wallach, H. Understanding the effect of accuracy on trust in machine learning models. In Proceedings of the CHI Conference on Human Factors in Computing Systems, Glasgow, UK, 4–9 May 2019; pp. 1–12. [Google Scholar] [CrossRef]

Figure 1. Taxonomy of XAI methods considering four levels of categorization.

Figure 2. Performance–explainability tradeoff as sketched in [39]: the more complex the model, the higher its predictive performance, but the lower its explainability.

Figure 3. General framework for generating a pointwise (non-probabilistic) explanation for a predictor’s decision. Input x is fed into the model f which outputs a prediction

f (x)

. The explainer e delivers an explanation for that prediction, based on the input x and the model f. We note that this framework can be applied to both post hoc methods as well as to inherently interpretable models, which is why the explanation is written in the more general form

e (f, x)

, not

e (x, f (x))

. The latter would suggest the idea of explaining the prediction in a post hoc fashion. Indeed, for inherently interpretable models, explainer e and prediction model f may be treated as a combined unit. We return to this issue in Section 4.3.

Figure 3. General framework for generating a pointwise (non-probabilistic) explanation for a predictor’s decision. Input x is fed into the model f which outputs a prediction

f (x)

. The explainer e delivers an explanation for that prediction, based on the input x and the model f. We note that this framework can be applied to both post hoc methods as well as to inherently interpretable models, which is why the explanation is written in the more general form

e (f, x)

, not

e (x, f (x))

. The latter would suggest the idea of explaining the prediction in a post hoc fashion. Indeed, for inherently interpretable models, explainer e and prediction model f may be treated as a combined unit. We return to this issue in Section 4.3.

Figure 4. Approach 1: Perturbed Input. Multiple perturbed inputs are fed into the model f, yielding an ensemble of predictions. For every prediction, the explanation is computed w.r.t. to the model f and the perturbed input. The resulting distribution is supposed to represent the explanation uncertainty.

Figure 7. Approach 2: Probabilistic Predictor. For a fixed input x, multiple predictions are obtained by upgrading the model f into a BNN or a Conformal Predictor. For each prediction

f^{(i)} (x)

, the explanation is computed. The resulting distribution is supposed to represent the explanation uncertainty.

Figure 7. Approach 2: Probabilistic Predictor. For a fixed input x, multiple predictions are obtained by upgrading the model f into a BNN or a Conformal Predictor. For each prediction

f^{(i)} (x)

, the explanation is computed. The resulting distribution is supposed to represent the explanation uncertainty.

Figure 8. Visualization of calibrated explanations (see repository (https://github.com/TeodorChiaburu/calibrated_explanations/tree/main/notebooks (accessed on 5 May 2024))): decision trees are used to explain the model’s prediction locally for a sample from OpenML’s wine dataset. The light red and light blue areas mark the uncertainties of the predictions (top two bars) and the feature weights (main plot).

Figure 9. Approach 3: Stochastic explainers. Due to the non-deterministic nature of the explainer e, a discrete distribution of explanations can be generated for the same input x and model f, which is supposed to represent the explanation uncertainty.

Figure 10. Examples for the CXPlain method (see repository (https://github.com/TeodorChiaburu/cxplain/tree/master/examples (accessed on 5 May 2024))). The red colored attribution maps are the explanations computed by masking various regions in the original image. The blue colored plots give an estimated uncertainty via the spread over the predictions of 100 UNets trained to reproduce the attribution maps. (a) Example from MNIST [104]. (b) Example from CIFAR10 [105].

Figure 11. BayesLIME example explanations on tabular data (taken from Figure 1 in [99]). The same instance from the COMPAS dataset was explained with LIME and different amounts of perturbations. The model used was a Random Forest Classifier from sklearn. The solid lines in both plots mark the feature importance scores computed by LIME. Green bars have a positive contribution towards the positive class (recidivism within two years after release), while red bars have a negative contribution. The shaded areas depict the uncertainties computed by BayesLime. The narrower the shaded region, the lower the uncertainty in the relevance of that feature. Note that the explanation uncertainty obviously decreases by increasing the number of perturbations in LIME. (a) 100 LIME perturbations. (b) 2000 LIME perturbations.

Figure 12. BayesLIME example explanations on the iNaturalist dataset, computed for the predictions of a VGG16 network [106]. The explanations are visualized via gif images (original gif available here (https://github.com/TeodorChiaburu/Modeling-Uncertainty-Local-Explainability/blob/master/visualization/bee_30ptg_10focus.gif) (accessed on 5 May 2024)), out of which only five frames are shown here. Green superpixel regions contribute positively towards the true class and red regions contribute negatively. The method generated 20 explanations, which ensures a confidence level of 95%. Regions that are consistently highlighted across multiple frames are viewed as relevant regions for the explanation with high certainty, while regions that only appear sporadically are more uncertain.

Figure 13. TCAV explanations for two samples from the iNaturalist dataset (see repository (https://github.com/TeodorChiaburu/beexplainable/tree/master/notebooks/XAI_Preliminary (accessed on 5 May 2024))). The black lines through the middle of the bars encode the standard deviation of the TCAV scores across all the considered random partitions (here 30), while the bar heights are the average TCAV scores for that concept. The longer the black line, the higher the uncertainty in the relevance of that concept’s score. The correct concept for Andrena fulva (a) is “fuzzy dark orange” (the other two are wrong), identified by TCAV correctly with some degree of uncertainty. A Bombus pratorum (b) has a “fuzzy yellow-black-striped” thorax and abdomen and also a “fuzzy dark orange” needle region; its body is not at all “smooth shiny dark brown”. Both correct concepts were predicted by TCAV, yet with an obviously higher degree of uncertainty than in the first example. A diamond ♢ marks a concept relevance score of zero.

Figure 14. CoProNN explanations for two samples from the ImageNet dataset (see repository (https://github.com/TeodorChiaburu/beexplainable/tree/master/notebooks/XAI_Preliminary (accessed on 5 May 2024))). The visualization technique is similar as in Figure 13. Here, the bar heights mark the average percentage of neighbors found to belong to that concept set across all considered random partitions (30 in total and

k = 8

neighbors), while the black lines show the standard deviation. A longer line highlights a higher uncertainty in that concept’s relevance score. The cheetah in (a) would be expected to be explained via the concept “dotted” and the tiger in (b) via “striped”. The explanation for the latter is correct and confident, while the explanation for the former is wrong and also more uncertain. A diamond ⋄ marks a concept relevance score of zero.

Figure 14. CoProNN explanations for two samples from the ImageNet dataset (see repository (https://github.com/TeodorChiaburu/beexplainable/tree/master/notebooks/XAI_Preliminary (accessed on 5 May 2024))). The visualization technique is similar as in Figure 13. Here, the bar heights mark the average percentage of neighbors found to belong to that concept set across all considered random partitions (30 in total and

k = 8

neighbors), while the black lines show the standard deviation. A longer line highlights a higher uncertainty in that concept’s relevance score. The cheetah in (a) would be expected to be explained via the concept “dotted” and the tiger in (b) via “striped”. The explanation for the latter is correct and confident, while the explanation for the former is wrong and also more uncertain. A diamond ⋄ marks a concept relevance score of zero.

Table 1. Overview of post hoc XAI methods with a stochastic explainer.

Name	Local/Global	Specific/Agnostic	Modality	Input Type	Task
CXPlain (https://github.com/d909b/cxplain (accessed on 5 May 2024)) [98]	local	agnostic	feature attribution	image, tabular, text	classification, regression
BayesLIME (https://github.com/dylan-slack/Modeling-Uncertainty-Local-Explainability (accessed on 5 May 2024)) [99] BayLIME (https://github.com/x-y-zhao/BayLime (accessed on 5 May 2024)) [100]	local	agnostic	feature attribution	image, tabular	classification
TCAV (https://github.com/tensorflow/tcav (accessed on 5 May 2024)) [33] & ACE [101]	local & global	specific	concepts	image	classification
CoProNN (https://github.com/TeodorChiaburu/beexplainable (accessed on 5 May 2024))	local & global	specific	concepts, examples	image	classification

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chiaburu, T.; Haußer, F.; Bießmann, F. Uncertainty in XAI: Human Perception and Modeling Approaches. Mach. Learn. Knowl. Extr. 2024, 6, 1170-1192. https://doi.org/10.3390/make6020055

AMA Style

Chiaburu T, Haußer F, Bießmann F. Uncertainty in XAI: Human Perception and Modeling Approaches. Machine Learning and Knowledge Extraction. 2024; 6(2):1170-1192. https://doi.org/10.3390/make6020055

Chicago/Turabian Style

Chiaburu, Teodor, Frank Haußer, and Felix Bießmann. 2024. "Uncertainty in XAI: Human Perception and Modeling Approaches" Machine Learning and Knowledge Extraction 6, no. 2: 1170-1192. https://doi.org/10.3390/make6020055

APA Style

Chiaburu, T., Haußer, F., & Bießmann, F. (2024). Uncertainty in XAI: Human Perception and Modeling Approaches. Machine Learning and Knowledge Extraction, 6(2), 1170-1192. https://doi.org/10.3390/make6020055

Article Menu

Uncertainty in XAI: Human Perception and Modeling Approaches

Abstract

1. Introduction

2. What Is XAI?

2.1. Why XAI?

2.2. Requirements of Explanations

2.3. Categorization of XAI Methods

2.4. Predictive Performance vs. Explainability

2.5. Human Perception and XAI

2.6. Evaluation of Explanations

3. What Is Uncertainty?

3.1. Uncertainty in ML

3.2. Uncertainty in XAI

4. Modeling Uncertainty in XAI

4.1. Approach 1: Perturbed Input (Variation in x)

Pixel Flipping

4.2. Approach 2: Probabilistic Predictor (Variation in f)

4.2.1. BNN: Monte Carlo Dropout

4.2.2. Conformal Predictor: Calibrated Explanations

4.3. Approach 3: Stochastic Explainers (Variation in e)

4.3.1. CXPlain

4.3.2. BayesLIME

4.3.3. TCAV

4.3.4. CoProNN

5. Human Perception and Uncertainty in XAI

5.1. Neural and Cognitive Aspects of Uncertainty

5.2. Uncertainty via Explanation Fragility

5.3. Effects of Communicating Uncertainty

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI