Next Article in Journal
HyperDE: An Adaptive Hyper-Heuristic for Global Optimization
Next Article in Special Issue
Predicting the Impact of Data Poisoning Attacks in Blockchain-Enabled Supply Chain Networks
Previous Article in Journal
Design of a Lower Limb Exoskeleton: Robust Control, Simulation and Experimental Results
Previous Article in Special Issue
Machine-Learning Techniques for Predicting Phishing Attacks in Blockchain Networks: A Comparative Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Information Theoretic Approach to Privacy-Preserving Interpretable and Transferable Learning

1
Faculty of Computer Science and Electrical Engineering, University of Rostock, 18051 Rostock, Germany
2
Software Competence Center Hagenberg GmbH, A-4232 Hagenberg, Austria
3
Institute of Signal Processing, Johannes Kepler University Linz, 4040 Linz, Austria
*
Author to whom correspondence should be addressed.
Algorithms 2023, 16(9), 450; https://doi.org/10.3390/a16090450
Submission received: 7 June 2023 / Revised: 30 August 2023 / Accepted: 8 September 2023 / Published: 20 September 2023
(This article belongs to the Special Issue Deep Learning Techniques for Computer Security Problems)

Abstract

:
In order to develop machine learning and deep learning models that take into account the guidelines and principles of trustworthy AI, a novel information theoretic approach is introduced in this article. A unified approach to privacy-preserving interpretable and transferable learning is considered for studying and optimizing the trade-offs between the privacy, interpretability, and transferability aspects of trustworthy AI. A variational membership-mapping Bayesian model is used for the analytical approximation of the defined information theoretic measures for privacy leakage, interpretability, and transferability. The approach consists of approximating the information theoretic measures by maximizing a lower-bound using variational optimization. The approach is demonstrated through numerous experiments on benchmark datasets and a real-world biomedical application concerned with the detection of mental stress in individuals using heart rate variability analysis.

1. Introduction

Trust in the development, deployment, and use of AI is essential in order to fully utilize the potential of AI to contribute to human well-being and society. Recent advances in machine and deep learning have rejuvenated the field of AI with an enthusiasm that AI could become an integral part of human life. However, a rapid proliferation of AI will give rise to several ethical, legal, and social issues.

1.1. Trustworthy AI

In response to the ethical, legal, and social challenges that accompany AI, guidelines and ethical principles have been established [1,2,3,4] in order to evaluate the responsible development of AI systems that are good for humanity and the environment. These guidelines have introduced the concept of trustworthy AI (TAI), and the term TAI has quickly gained attention in research and practice. TAI is based on the idea that trust in AI will allow AI to realize its full potential in contributing to societies, economies, and sustainable development. As “trust” is a complex phenomenon being studied in diverse disciplines (i.e., psychology, sociology, economics, management, computer science, and information systems), the definition and realization of TAI remains challenging. While forming trust in technology, users express expectations about the technology’s functionality, helpfulness and reliability [5]. The authors in [6] state that “AI is perceived as trustworthy by its users (e.g., consumers, organizations, and society) when it is developed, deployed, and used in ways that not only ensure its compliance with all relevant laws and its robustness but especially its adherence to general ethical principles”.
Academicians, industries, and policymakers have developed in recent times for TAI several frameworks and guidelines including “Asilomar AI Principles” [7], “Montreal Declaration of Responsible AI” [8], “UK AI Code” [9], “AI4People” [4], “Ethics Guidelines for Trustworthy AI” [1], “OECD Principles on AI” [10], “Governance Principles for the New Generation Artificial Intelligence” [11], and “Guidance for Regulation of Artificial Intelligence Applications” [12]. However, it was argued in [13] that AI ethics lack a reinforcement mechanism, and economic incentives could easily override commitment to ethical principles and values.
The five principles of ethical AI [4] (i.e., beneficence, non-maleficence, autonomy, justice, and explicability) have been adopted for TAI [6]. Beneficence refers to promoting the well-being of humans, preserving dignity, and sustaining the planet. Non-maleficence refers to avoiding bringing harm to people and is especially concerned with the protection of people’s privacy and security. Autonomy refers to the promotion of human autonomy, agency, and oversight including the restriction of AI systems’ autonomy, where necessary. Justice refers to using AI for correcting past wrongs, ensuring shared benefits through AI, and preventing the creation of new harms and inequities by AI. Explicability comprises an epistemological sense and an ethical sense. Explicability refers in the epistemological sense to the explainable AI developed by creating interpretable AI models with high levels of performance and accuracy. In the ethical sense, explicability refers to accountable AI.

1.2. Motivation and Novelty

The core issues related to machine and deep learning that need to be addressed in order to fulfill the five principles of trustworthy AI are listed in Table 1.
Solution approaches to address issues concerning TAI have been identified in Table 1; however, a unified solution approach addressing all major issues does not exist. Despite the importance of the outlined TAI principles, their major limitation, as identified in [6], concerns the fact that the principles are highly general and provide little to no guidance for how they can be transferred into practice. To address this limitation, a data-driven research framework for TAI was outlined in [6]. However, to the best knowledge of the authors, no previous study presented a unified information theoretic approach to study the privacy, interpretability, and transferability aspects of trustworthy AI in a rigorous analytical manner. This motivated us in this study to develop a novel information theoretic approach for addressing the privacy, interpretability, and transferability aspects of trustworthy AI in a rigorous analytical manner. This study introduces a unified information theoretic approach to “privacy-preserving interpretable and transferable learning”, as represented in Figure 1, for addressing trustworthy AI issues, which is the novelty of this study.

1.3. Goal and Aims

Our goal is to develop a novel approach to trustworthy AI based on the hypothesis that information theory enables taking into account the privacy, interpretability, and transferability aspects of trustworthy AI principles during the development of machine learning and deep learning models by providing a way to study and optimize the inherent trade-offs. The aims focused on the development of our approach are the following:
Aim 1:
To develop an information theoretic approach to privacy that enables the quantification of privacy leakage in terms of the mutual information between sensitive private data and the data released to the public without the availability of prior knowledge about data statistics (such as joint distributions of public and private variables).
Aim 2:
To develop an information theoretic criterion for evaluating the interpretability of a machine learning model in terms of the mutual information between non-interpretable model outputs/activations and corresponding interpretable parameters.
Aim 3:
To develop an information theoretic criterion for evaluating the transferability (of a machine learning model from source to target domain) in terms of the mutual information between source domain model outputs/activations and target domain model outputs/activations.
Aim 4:
To develop analytical approaches to machine and deep learning allowing for the quantification of model uncertainties.
Aim 5:
To develop a unified approach to “privacy-preserving interpretable and transferable learning” for an analytical optimization of privacy–interpretability–transferability trade-offs.

1.4. Methodology

Figure 2 outlines the methodological workflow. For an information theoretic evaluation of the privacy leakage, interpretability, and transferability, we provide a novel method that consists of following three steps:

1.4.1. Defining Measures in Terms of the Information Leakages

The privacy, interpretability, and transferability measures are defined in terms of the information leakages:
  • Privacy leakage is measured as the amount of information about private/sensitive variables leaked by the shared variables;
  • Interpretability is measured as the amount of information about interpretable parameters leaked by the model;
  • Transferability is measured as the amount of information about the source domain model output leaked by the target domain model output.

1.4.2. Variational Membership Mapping Bayesian Models

In order to derive analytical expressions for the defined privacy leakage, interpretability, and transferability measures, the stochastic inverse models (governing the relationships amongst variables) will be required. In this study, the variational membership mappings are leveraged to build the required stochastic inverse models. Membership mappings [14,15] have been introduced as an alternative to deep neural networks in order to address the issues such as determining the optimal model structure, smaller training dataset, and iterative time-consuming nature of numerical learning algorithms [16,17,18,19,20,21,22]. A membership mapping represents data through a fuzzy set (characterized by a membership function such that the dimension of the membership function increases with an increasing data size). A remarkable feature of membership mappings is that these allow an analytical approach to the variational learning of a membership-mappings-based data representation model. Our idea is to employ membership mappings for defining a stochastic inverse model, which is then inferred using variational Bayesian methodology.

1.4.3. Variational Approximation of Information Theoretic Measures

The variational membership-mapping Bayesian models are used to determine the lower bounds on the defined information theoretic measures for privacy leakage, interpretability, and transferability. The lower bounds are then maximized using variational optimization methodology to derive analytically the expressions that approximate the privacy leakage, interpretability, and transferability measures. The analytically derived expressions form the basis of an algorithm that practically computes the measures using available data samples, where expectations over unknown distributions are approximated by sample averages.

1.5. Contributions

The main contributions of this study are following:

1.5.1. A Unified Approach to Study the Privacy, Interpretability, and Transferability Aspects of Trustworthy AI

The study introduces a novel information theoretic unified approach (as represented in Figure 1) to address the
  • Issues I1 and I2 of beneficence principle by means of transfer and federated learning;
  • Issues I3 and I4 of non-maleficence principle by means of privacy-preserving data release mechanisms;
  • Issue I5 of autonomy principle by means of analytical machine and deep learning algorithms that enable the user to quantify model uncertainties and hence to decide the level of autonomy given to AI systems;
  • Issue I6 of justice principle by means of federated learning;
  • Issue I7 of explicability principle by means of interpretable machine and deep learning models.

1.5.2. Information Theoretic Quantification of Privacy, Interpretability, and Transferability

The most important feature of our approach is that the notions of privacy, interpretability, and transferability are quantified by information theoretic measures allowing for the study and optimization of trade-offs (such as trade-off between privacy and transferability or trade-off between privacy and interpretability) in a practical manner.

1.5.3. Computation of Information Theoretic Measures without Requiring the Knowledge of Data Distributions

It is possible to derive analytical expressions for the defined measures provided that the knowledge regarding the data distributions is available. However, in practice, the data distributions are unknown, and thus, a way to approximate the defined measures is required. Therefore, a novel method that employs recently introduced membership mappings [14,15,16,17,18,19,20,21,22], is presented for approximating the defined privacy leakage, interpretability, and transferability measures. The method relies on inferring a variational Bayesian model that facilitates an analytical approximation of the information theoretic measures through variational optimization methodology. A computational algorithm is provided for practically calculating the privacy leakage, interpretability, and transferability measures. Finally, an algorithm is presented that provides
  • Information theoretic evaluation of privacy leakage, interpretability, and transferability in a semi-supervised transfer and multi-task learning scenario;
  • An adversary model for estimating private data and for simulating privacy attacks; and
  • An interpretability model for estimating interpretable parameters and for providing an interpretation to the non-interpretable data vectors.

1.6. Organization

This text is organized into sections. The proposed methodology relies on the membership mappings for data representation learning. Therefore, Section 2 has been dedicated to the review of membership mappings. An application of membership mappings to solve an inverse modeling problem by developing a variational membership-mapping Bayesian model is considered in Section 3. Section 4 presents the most important result of this study on the variational approximation of information leakage and development of a computational algorithm for calculating information leakage. The measures for privacy leakage, interpretability, and transferability are formally introduced in Section 5. Section 5 further provides an algorithm to study the privacy, interpretability, and transferability aspects in a unified manner. The application of proposed measures to study the trade-offs is also demonstrated through the experiments made on the widely used MNIST and “Office+Caltech256” datasets in Section 6. Section 6 further considers a biomedical application concerned with the detection of mental stress in individuals using heart rate variability analysis. Finally, the concluding remarks are provided in Section 7.

2. Mathematical Background

This section reviews the membership mappings and transferable deep learning from [14,15,22]. For a detailed mathematical study of the concepts used in this section, the readers are referred to previous works [14,15,22].

2.1. Notations

  • Let n , N , p , M N .
  • Let B ( R N ) denote the Borel σ algebra on R N , and let λ N denote the Lebesgue measure on B ( R N ) .
  • Let ( X , A , ρ ) be a probability space with unknown probability measure ρ .
  • Let us denote by S the set of finite samples of data points drawn i.i.d. from ρ , i.e.,
    S : = { ( x i ρ ) i = 1 N | N N } .
  • For a sequence x = ( x 1 , , x N ) S , let | x | denote the cardinality, i.e., | x | = N .
  • If x = ( x 1 , , x N ) , a = ( a 1 , , a M ) S , then x a denotes the concatenation of the sequences x and a , i.e., x a = ( x 1 , , x N , a 1 , , a M ) .
  • Let us denote by F ( X ) the set of A - B ( R ) measurable functions f : X R , i.e.,
    F ( X ) : = { f : X R | f i s A B ( R ) m e a s u r a b l e } .
  • For convenience, the values of a function f F ( X ) at points in the collection x = ( x 1 , , x N ) are represented as f ( x ) = ( f ( x 1 ) , , f ( x N ) ) .
  • Let ζ x : R | x | [ 0 , 1 ] be a membership function satisfying the following properties:
    Nowhere Vanishing:
    ζ x ( y ) > 0 for all y R | x | , i.e.,
    supp [ ζ x ] = R | x | .
    Positive and Bounded Integrals:
    The functions ζ x are absolutely continuous and Lebesgue integrable over the whole domain such that for all x S , we have
    0 < R | x | ζ x d λ | x | < .
    Consistency of Induced Probability Measure:
    The membership function induced probability measures P ζ x , defined on any A B ( R | x | ) , as
    P ζ x ( A ) : = 1 R | x | ζ x d λ | x | A ζ x d λ | x |
    are consistent in the sense that for all x , a S :
    P ζ x a ( A × R | a | ) = P ζ x ( A ) .
    The collection of membership functions satisfying the aforementioned assumptions is denoted by
    Θ : = { ζ x : R | x | [ 0 , 1 ] | ( 3 ) , ( 4 ) , ( 6 ) , x S } .

2.2. Review of Variational Membership Mappings

Definition 1
(Student-t Membership Mapping [14]). A Student-t membership-mapping, F F ( X ) , is a mapping with input space X = R n and a membership function ζ x Θ that is Student-t like:
ζ x ( y ) = 1 + 1 / ( ν 2 ) y m y T K x x 1 y m y ν + | x | 2
where x S , y R | x | , ν R + [ 0 , 2 ] is the degrees of freedom, m y R | x | is the mean vector, and K x x R | x | × | x | is the covariance matrix with its ( i , j ) - th element given as
( K x x ) i , j = k r ( x i , x j )
where k r : R n × R n R is a positive definite kernel function defined as
k r ( x i , x j ) = σ 2 exp 0.5 k = 1 n w k x k i x k j 2
where x k i is the k-th element of x i , σ 2 is the variance parameter, and w k 0 (for k { 1 , , n } ).
Given a dataset { ( x i , y i ) | x i R n , y i R p , i { 1 , , N } } , it is assumed that there exist zero-mean Student-t membership mappings F 1 , , F p F ( R n ) such that
y i F 1 ( x i ) F p ( x i ) T .
Under modeling scenario (11), [22] presents an algorithm (stated as Algorithm 1) for the variational learning of membership mappings.
Algorithm 1 Variational learning of the membership mappings [22]
Require: 
Dataset ( x i , y i ) | x i R n , y i R p , i { 1 , , N } and maximum possible number of auxiliary points M m a x Z + with M m a x N .
  1:
Choose ν and w = ( w 1 , , w n ) as in (12) and (14), respectively.
  2:
Choose a small positive value κ = 10 1 .
  3:
Set iteration count i t = 0 and M | 0 = M m a x .
  4:
while  τ ( M | i t , 1 ) < κ  do
  5:
     M | i t + 1 = 0.9 M | i t
  6:
     i t i t + 1
  7:
end while
  8:
Set M = M | i t .
  9:
if  τ ( M , 1 ) 1 p j = 1 p var y j 1 , , y j N  then
10:
     σ 2 = 1
11:
else
12:
     σ 2 = 1 τ ( M , 1 ) 1 p j = 1 p var y j 1 , , y j N
13:
end if
14:
Compute a = { a m } m = 1 M using (13), K x x using (9), K a a using (15), and K x a using (16).
15:
Set β = 1 .
16:
repeat
17:
    Compute α using (18).
18:
    Update the value of β using (19).
19:
until ( β nearly converges)
20:
Compute α using (18).
21:
return the parameters set M = { α , a , M , σ , w } .
With reference to Algorithm 1, we have following:
  • The degrees of freedom associated to the Student-t membership mapping ν R + [ 0 , 2 ] is chosen as
    ν = 2.1
  • The auxiliary inducing points are suggested to be chosen as the cluster centroids:
    a = { a m } m = 1 M = c l u s t e r _ c e n t r o i d ( { x i } i = 1 N , M )
    where c l u s t e r _ c e n t r o i d ( { x i } i = 1 N , M ) represents the k-means clustering on { x i } i = 1 N .
  • The parameters ( w 1 , , w n ) for kernel function (10) are chosen such that w k (for k { 1 , 2 , , n } ) is given as
    w k = max 1 i N x k i min 1 i N x k i 2
    where x k i is the k-th element of vector x i R n .
  • K a a R M × M and K x a R N × M are matrices with their ( i , j ) -th elements given as
    K a a i , j = k r ( a i , a j )
    K x a i , j = k r ( x i , a j )
    where k r : R n × R n R is a positive definite kernel function defined as in (10).
  • The scalar-valued function τ ( M , σ 2 ) is defined as
    τ ( M , σ 2 ) : = T r ( K x x ) T r ( ( K a a ) 1 K x a T K x a ) ν + M 2
    where a is given by (13), ν is given by (12), and parameters ( w 1 , , w n ) (which are required to evaluate the kernel function for computing matrices K x x , K a a , and K x a ) are given by (14).
  • α = α 1 α p R M × p is a matrix with its j-th column defined as
    α j : = K x a T K x a + T r ( K x x ) T r ( ( K a a ) 1 K x a T K x a ) ν + M 2 K a a + K a a β 1 ( K x a ) T y j
  • The disturbance precision value β is iteratively estimated as
    1 β = 1 p N j = 1 p i = 1 N y j i F j ( x i ) ^ 2
    where F j ( x i ) ^ is the estimated membership-mapping output given as
    F j ( x i ) ^ = G ( x i ) α j .
    Here, G ( x ) R 1 × M is a vector-valued function defined as
    G ( x ) : = k r ( x , a 1 ) k r ( x , a M )
    where k r : R n × R n R is defined as in (10).
Definition 2
(Membership-Mappings Prediction [22]). Given the parameters set M = { α , a , M , σ , w } returned by Algorithm 1, the learned membership mappings could be used to predict output corresponding to any arbitrary input data point x R n as
y ^ ( x ; M ) = α T ( G ( x ) ) T
where G ( · ) R 1 × M is a vector-valued function (21).

2.3. Review of Membership-Mappings-Based Conditionally Deep Autoencoders

Definition 3
(Membership-Mapping Autoencoder [15]). A membership-mapping autoencoder, G : R p R p , maps an input vector y R p to G ( y ) R p such that
G ( y ) = def F 1 ( P y ) F p ( P y ) T ,
where F j ( j { 1 , 2 , , p } ) is a Student-t membership-mapping, P R n × p ( n p ) is a matrix such that the product P y is a lower-dimensional encoding for y.
Definition 4
(Conditionally Deep Membership-Mapping Autoencoder (CDMMA) [15,22]). A conditionally deep membership-mapping autoencoder, D : R p R p , maps a vector y R p to D ( y ) R p through a nested composition of finite number of membership-mapping autoencoders such that
y l = ( G l G 2 G 1 ) ( y ) , l { 1 , 2 , , L }
l * = arg min l { 1 , 2 , , L } y y l 2
D ( y ) = y l * ,
where G l ( · ) is a membership-mapping autoencoder (Definition 3).
CDMMA discovers layers of increasingly abstract data representation with the lowest-level data features being modeled by the first layer and the highest-level data features being modeled by the end layer [15,22]. An algorithm (stated as Algorithm 2) has been provided in [15,22] for the variational learning of CDMMA.
Algorithm 2 Variational learning of CDMMA [15,22]
Require: 
Dataset Y = y i R p | i { 1 , , N } ; the subspace dimension n { 1 , 2 , , p } ; maximum number of auxiliary points M m a x Z + with M m a x N ; the number of layers L Z + .
1:
for  l = 1 to L do
2:
    Set subspace dimension associated to l-th layer as n l = max ( n l + 1 , 1 ) .
3:
    Define P l R n l × p such that the i-th row of P l is equal to the transpose of eigenvector corresponding to the i-th largest eigenvalue of a sample covariance matrix of dataset Y .
4:
    Define a latent variable x l , i R n l , for i { 1 , , N } , as
x l , i : = P l y i if l = 1 , P l y ^ l 1 ( x l 1 , i ; M l 1 ) if l > 1
where y ^ l 1 is the estimated output of the ( l 1 ) -th layer computed using (22) for the parameters set M l 1 = { α l 1 , a l 1 , M l 1 , σ l 1 , w l 1 } .
5:
    Define M m a x l as
M m a x l : = M m a x if l = 1 , M l 1 if l > 1
6:
    Compute parameters set M l = { α l , a l , M l , σ l , w l } , characterizing the membership mappings associated to the l-th layer, using Algorithm 1 on dataset ( x l , i , y i ) | i { 1 , , N } with the maximum possible number of auxiliary points M m a x l .
7:
end for
8:
return the parameters set M = { { M 1 , , M L } , { P 1 , , P L } } .
Definition 5
(CDMMA Filtering [15,22]). Given a CDMMA with its parameters being represented by a set M = { { M 1 , , M L } , { P 1 , , P L } } , the autoencoder can be applied for filtering a given input vector y R p as follows:
x l ( y ; M ) = P l y , l = 1 P l y ^ l 1 ( x l 1 ; M l 1 ) l 2
Here, y ^ l 1 is the output of the ( l 1 ) -th layer estimated using (22). Finally, CDMMA’s output, D ( y ; M ) , is given as
D ^ ( y ; M ) = y ^ l * ( x l * ; M l * )
l * = arg min l { 1 , , L } y y ^ l ( x l ; M l ) 2 .
For a big dataset, the computational time required by Algorithm 2 for learning will be high. To circumvent the problem of large computation time for processing big data, it is suggested in [15,22] that the data be partitioned into subsets and corresponding to each data subset, a separate CDMMA is learned. This motivates the defining of a wide CDMMA as in Definition 6. For the variational learning of wide CDMMA, Algorithm 3 follows from [15,22], where the choice of number of subsets as S = N / 1000 is driven by the consideration that each subset contains around 1000 data points, since processing the data points up to 1000 by CDMMA is not computationally a challenge.
Definition 6
(A Wide CDMMA [15,22]). A wide CDMMA, WD : R p R p , maps a vector y R p to WD ( y ) R p through a parallel composition of S ( S Z + ) number of CDMMAs such that
WD ( y ) = D s * ( y )
s * = arg min s { 1 , 2 , , S } y D s ( y ) 2 ,
where D s ( y ) is the output of the s-th CDMMA.
Algorithm 3 Variational learning of wide CDMMA [15,22]
Require: 
Dataset Y = y i R p | i { 1 , , N } ; the subspace dimension n { 1 , 2 , , p } ; ratio r m a x ( 0 , 1 ] ; the number of layers L Z + .
1:
Apply k-means clustering to partition Y into S subsets, { Y 1 , , Y S } , where S = N / 1000 .
2:
for  s = 1 to S do
3:
    Build a CDMMA, M s , by applying Algorithm 2 on Y s taking n as the subspace dimension; maximum number of auxiliary points as equal to r m a x × # Y s (where # Y s is the number of data points in Y s ); and L is the number of layers.
4:
end for
5:
return the parameters set P = { M s } s = 1 S .
Definition 7
(Wide CDMMA Filtering [15,22]). Given a wide CDMMA with its parameters being represented by a set P = { M s } s = 1 S , the autoencoder can be applied for filtering a given input vector y R p as follows:
WD ^ ( y ; P ) = D ^ ( y ; M s * )
s * = arg min s { 1 , 2 , , S } y D ^ ( y ; M s ) 2 ,
where D ^ ( y ; M s ) is the output of the s-th CDMMA estimated using (30).

2.4. Membership Mappings for Classification

A classifier (i.e., Definition 8) and an algorithm for its variational learning (stated as Algorithm 4) follows from [15,22].
Definition 8
(A Classifier [15,22]). A classifier, C : R p { 1 , 2 , , C } , maps a vector y R p to C ( y ) { 1 , 2 , , C } such that
C ( y ; { P c } c = 1 C ) = arg min c { 1 , 2 , , C } y WD ^ ( y ; P c ) 2
where WD ^ ( y ; P c ) , computed using (34), is the output of the c-th wide CDMMA. The classifier assigns to an input vector the label of that class whose associated autoencoder best reconstructs the input vector.
Algorithm 4 Variational learning of the classifier [15,22]
Require: 
Labeled dataset Y = Y c | Y c = y i , c R p | i { 1 , , N c } , c { 1 , , C } ; the subspace dimension n { 1 , , p } ; ratio r m a x ( 0 , 1 ] ; the number of layers L Z + .
1:
for  c = 1 to C do
2:
    Build a wide CDMMA, P c = { M c s } s = 1 S c , by applying Algorithm 3 on Y c for the given n, r m a x , and L.
3:
end for
4:
return the parameters set { P c } c = 1 C .

2.5. Review of Membership-Mappings-Based Privacy-Preserving Transferable Learning

A privacy-preserving semi-supervised transfer and multi-task learning problem has been recently addressed in [22] by means of variational membership mappings. The method, as suggested in [22], involves the following steps:

2.5.1. Optimal Noise Adding Mechanism for Differentially Private Classifiers

The approach suggested in [22] relies on a tailored noise adding mechanism to achieve a given level of differential privacy loss bound with the minimum perturbation of the data. In particularly, Algorithm 5 is suggested for a differentially private approximation of data samples and Algorithm 6 is suggested for building a differentially private classifier.
Algorithm 5 Differentially private approximation of data samples [22]
Require: 
Dataset Y = y i R p | i { 1 , , N } ; differential privacy parameters: d R + , ϵ R + , δ ( 0 , 1 ) .
1:
A differentially private approximation of data samples is provided as
y j + i = y j i + F v j i 1 ( r j i ; ϵ , δ , d ) , r j i ( 0 , 1 )
F v j i 1 ( r j i ; ϵ , δ , d ) = d ϵ log ( 2 r j i 1 δ ) , r j i < 1 δ 2 0 , r j i [ 1 δ 2 , 1 + δ 2 ] , r j i ( 0 , 1 ) d ϵ log ( 2 ( 1 r j i ) 1 δ ) , r j i > 1 + δ 2 .
where y j + i is the j-th element of y + i R p .
2:
return  Y + = y + i R p | i { 1 , , N } .
Algorithm 6 Variational learning of a differentially private classifier [22]
Require: 
Differentially private approximated dataset: Y + = Y c + | c { 1 , , C } ; the subspace dimension n { 1 , , p } ; ratio r m a x ( 0 , 1 ] ; the number of layers L Z + .
1:
Build a classifier, { P c + } c = 1 C , by applying Algorithm 4 on Y + for the given n, r m a x , and L.
2:
return  { P c + } c = 1 C .

2.5.2. Semi-Supervised Transfer Learning Scenario

The aim is to transfer the knowledge extracted by a classifier trained using a source dataset to the classifier of the target domain such that the privacy of the source dataset is preserved. Let { Y c s r } c = 1 C be the labeled source dataset where Y c s r = { y s r i , c R p s r | i { 1 , , N c s r } } represents the c-th labelled samples. The target dataset consist of a few labeled samples { Y c t g } c = 1 C (with Y c t g = { y t g i , c R p t g | i { 1 , , N c t g } } ) and another set of unlabeled samples Y * t g = { y t g i , * R p t g | i { 1 , , N * t g } } .

2.5.3. Differentially Private Source Domain Classifier

For a given differential privacy parameters: d , ϵ , δ ; Algorithm 5 is applied on Y c s r to obtain the differentially private approximated data samples, Y c + s r = { y s r + i , c R p s r | i { 1 , , N c s r } } , for all c { 1 , , C } . Algorithm 6 is applied on { Y c + s r } c = 1 C to build a differentially private source domain classifier characterized by parameters sets { P c + s r } c = 1 C .

2.5.4. Latent Subspace Transformation Matrices

For a given subspace dimension n s t { 1 , 2 , , min ( p s r , p t g ) } , the source domain transformation matrix V + s r R n s t × p s r is defined as with its i-th row equal to the transpose of the eigenvector corresponding to the i-th largest eigenvalue of the sample covariance matrix computed on differentially private approximated source samples. The target domain transformation matrix V t g R n s t × p t g is defined as with its i-th row equal to the transpose of the eigenvector corresponding to the i-th largest eigenvalue of the sample covariance matrix computed on target samples.

2.5.5. Subspace Alignment

A target sample is mapped to source-data-space via following transformation:
y t g s r ( y t g ) = y t g , p s r = p t g ( V + s r ) T V t g y t g , p s r p t g
Both labeled and unlabeled target datasets are transformed to define the following sets:
Y c t g s r : = { y t g s r ( y t g ) | y t g Y c t g }
Y * t g s r : = { y t g s r ( y t g ) | y t g Y * t g } .

2.5.6. Target Domain Classifier

The k-th iteration for building the target domain classifier, where k { 1 , , i t _ m a x } , consists of the following updates:
{ P c t g | k } c = 1 C = Algorithm 4 Y c t g s r Y * , c t g s r | k 1 c = 1 C , n | k , r m a x , L
Y * , c t g s r | k = y t g s r i , * Y * t g s r | C ( y t g s r i , * ; { P c t g | k } c = 1 C ) = c , i { 1 , , N * t g }
where n | 1 , n | 2 , is a monotonically non-decreasing sequence.

2.5.7. source2target Model

The mapping from source to target domain is learned by means of a variational membership-mappings-based model as in the following:
M s r t g = Algorithm 1 D , M m a x
D : = WD ^ ( y ; P c + s r ) , y | y Y c t g s r Y * , c t g s r | i t _ m a x , c 1 , , C
M m a x = min ( N t g / 2 , 1000 )
where N t g = | D | is the total number of target samples, WD ^ ( · ; · ) is defined as in (34), Y c t g s r is defined as in (40), and Y * , c t g s r is defined as in (43).

2.5.8. Transfer and Multi-Task Learning

Both source and target domain classifiers are combined with the source2target model for predicting the label associated to a target sample y t g s r as
c ^ ( y t g s r ; { P c t g } c = 1 C , { P c + s r } c = 1 C , M s r t g ) = arg min c { 1 , 2 , , C } min y t g s r WD ^ ( y t g s r ; P c t g ) 2 , y t g s r y ^ WD ^ ( y t g s r ; P c + s r ) ; M s r t g 2 , y t g s r WD ^ ( y t g s r ; P c + s r ) 2 .
where y ^ · ; M s r t g is the output of the source2target model computed using (22).

3. Variational Membership-Mapping Bayesian Models

We consider the application of membership mappings to solve the inverse modeling problem related to x = f t x ( t ) , where f t x : R q R n is a forward map. Specifically, a membership-mappings model is used to approximate the inverse mapping f t x 1 .

3.1. A Prior Model

Given a dataset: { ( x i , t i ) | i { 1 , , N } } , Algorithm 1 can be used to build a membership-mappings model characterized by a set of parameters, say M x t = { α x t , a , M , σ , w } (where x t indicates the mapping from x to t has been approximated by the membership mappings). It follows from (22) that the membership-mappings model predicted output corresponding to an input x is given as
t ^ ( x ; M x t ) = ( α x t ) T ( G ( x ) ) T
where G ( · ) R 1 × M is a vector-valued function defined as in (21). The k-th element of t ^ is given as
t ^ k ( x ; M x t ) = ( G ( x ) ) α k x t
where α k x t is the k-th column of matrix α x t .
Expression (49) allows estimating for any arbitrary x the corresponding t using a membership-mappings model. This motivates introducing the following prior model:
t k = G ( x ) θ k + e k
θ k N ( α k x t , Λ k 1 )
e k N ( 0 , γ 1 )
γ Gamma ( a γ , b γ )
where k { 1 , , q } ; N ( α k x t , Λ k 1 ) is the multivariate normal distribution with mean α k x t and covariance Λ k 1 ; and Gamma ( a γ , b γ ) is the Gamma distribution with shape parameter a γ and rate parameter b γ . The estimation provided by membership-mappings model M x t (i.e., (49)) is incorporated by the prior model (50)–(53), since
E [ t k ] = t ^ k ( x ; M x t ) .

3.2. Variational Bayesian Inference

Given the dataset, { ( x i R n , t i R q ) | i { 1 , 2 , , N } } , the variational Bayesian method is considered for an inference of the stochastic model (50), with priors as (51)–(53). For all i { 1 , , N } and k { 1 , , q } , we have
t k i = G ( x i ) θ k + e k i ,
where θ k N ( α k x t , Λ k 1 ) and e k i N ( 0 , γ 1 ) . Define t k R N , e k R N , and R x R N × M as
t k = t k 1 t k N T
e k = e k 1 e k N T
R x = G ( x 1 ) T G ( x N ) T T .
For all k { 1 , , q } , we have
t k = R x θ k + e k
p ( θ k ; α k x t , Λ k ) = 1 ( 2 π ) M | ( Λ k ) 1 | exp 0.5 ( θ k α k x t ) T Λ k ( θ k α k x t )
p ( e k ; γ ) = 1 ( 2 π ) N ( γ ) N exp 0.5 γ e k 2
p ( γ ; a γ , b γ ) = b γ a γ / Γ ( a γ ) ( γ ) a γ 1 exp ( b γ γ ) .
Define the following sets:
t = { t 1 , , t q }
θ = { θ 1 , , θ q }
and consider the marginal probability of data t which is given as
p ( t ) = d θ d γ p ( t , θ , γ ) .
Let q ( θ , γ ) be an arbitrary distribution. The log marginal probability of t can be expressed as
log ( p ( t ) ) = d θ d γ q ( θ , γ ) log ( p ( t ) )
= d θ d γ q ( θ , γ ) log p ( t , θ , γ ) q ( θ , γ ) + d θ d γ q ( θ , γ ) log q ( θ , γ ) p ( θ , γ | t ) .
Define
L ( q ( θ , γ ) , t ) : = d θ d γ q ( θ , γ ) log p ( t , θ , γ ) / q ( θ , γ )
to express (66) as
log ( p ( t ) ) = L ( q ( θ , γ ) , t ) + KL ( q ( θ , γ ) p ( θ , γ | t ) )
where KL is the Kullback–Leibler divergence of p ( θ , γ | t ) from q ( θ , γ ) and L , referred to as negative free energy, provides a lower bound on the the logarithmic evidence for the data.
The variational Bayesian approach minimizes the difference (in term of KL divergence) between variational and true posteriors via analytically maximizing negative free energy L over variational distributions. However, the analytical derivation requires the following widely used mean-field approximation:
q ( θ , γ ) = q ( θ ) q ( γ )
= q ( θ 1 ) q ( θ q ) q ( γ ) .
Applying the standard variational optimization technique (as in [23,24,25,26,27,28,29]), it can be verified that the optimal variational distributions maximizing L are as follows:
q * ( θ k ) = 1 ( 2 π ) M | ( Λ ^ k ) 1 | exp 0.5 ( θ k m ^ k ) T Λ ^ k ( θ k m ^ k )
q * ( γ ) = ( b ^ γ ) a ^ γ / Γ ( a ^ γ ) ( γ ) a ^ γ 1 exp ( b ^ γ γ )
where the parameters ( Λ ^ k , m ^ k , a ^ γ , b ^ γ ) satisfy the following:
Λ ^ k = Λ k + a ^ γ / b ^ γ ( R x ) T R x
m ^ k = ( Λ ^ k ) 1 Λ k α k x t + a ^ γ / b ^ γ ( R x ) T t k
a ^ γ = a γ + 0.5 q N
b ^ γ = b γ + 0.5 k = 1 q t k R x m ^ k 2 + T r ( Λ ^ k ) 1 ( R x ) T R x .
Algorithm 7 is suggested for variational Bayesian inference of the model.
Algorithm 7 Variational membership-mapping Bayesian model inference
Require: 
Dataset ( x i R n , t i R q ) | i { 1 , , N } and maximum possible number of auxiliary points M m a x Z + with M m a x N .
1:
Apply Algorithm 1 on the dataset to build a variational membership-mappings model M x t = { α x t , a , M , σ , w } .
2:
Choose non-informative priors for covariance matrix, i.e., Λ k = 10 3 I M , k { 1 , , q } .
3:
Choose non-informative priors for noise variance, i.e., a γ = 10 3 , b γ = 10 3 .
4:
Initialize a ^ γ / b ^ γ = 1 .
5:
repeat
6:
    update { Λ ^ k , m ^ k | k { 1 , , q } } , a ^ γ , b ^ γ using (74), (75), (76), (77).
7:
until convergence.
8:
return the parameters set BM x t = { { m ^ k , Λ ^ k | k { 1 , , q } } , a ^ γ , b ^ γ } .
The functionality of Algorithm 7 is as follows:
  • Step 1 builds a variational membership-mappings model using Algorithm 1 from previous work [22].
  • Algorithm 7 choses at step 2 and 3 relatively non-informative priors.
  • The loop between step 5 and step 7 applies variational Bayesian inference to iteratively estimate the parameters of optimal distributions until convergence.
Remark 1
(Computational Complexity). The computational complexity of Algorithm 7 is asymptotically dominated by the computation of inverse of M × M dimensional matrix Λ ^ k in (75) to calculate m ^ k . Thus, the computational complexity of Algorithm 7 is given as O ( M 3 ) , where M is the number of auxiliary points.
The optimal distributions determined using Algorithm 7 define the so-called Variational Membership-Mapping Bayesian Model (VMMBM) as stated in Remark 2.
Remark 2
(Variational Membership-Mapping Bayesian Model (VMMBM)). The inverse mapping, f t x 1 , is approximated as
t k = G ( x ) θ k + e k ,
θ k N ( m ^ k , Λ ^ k 1 )
e k N ( 0 , γ 1 )
γ G a m m a ( a ^ γ , b ^ γ )
where k { 1 , , q } and ( m ^ k , Λ ^ k , a ^ γ , b ^ γ ) are returned by Algorithm 7.
Remark 3
(Estimation by VMMBM). Given any x * , the variational membership-mapping Bayesian model BM x t (returned by Algorithm 7) can be used to estimate corresponding t * (such that x * = f t x ( t * ) ) as
t ˜ ( x * ; BM x t ) = G ( x ) m ^ 1 G ( x ) m ^ q T .

4. Evaluation of the Information-Leakage

Consider a scenario where a variable t is related to another variable x through a mapping f t x such that x = f t x ( t ) . The mutual information I ( t ; x ) measures the amount of information obtained about variable t through observing variable x. Since x = f t x ( t ) , the entropy H ( t ) remains fixed independent of mapping f t x , and thus, the quantity I ( t ; x ) H ( t ) is a measure of the amount of information about t leaked by the mapping f t x .
Definition 9
(Information Leakage). Under the scenario that x = f t x ( t ) , a measure of the amount of information about t leaked by the mapping f t x is defined as
I L f t x : = I ( t ; f t x ( t ) ) H ( t )
= I ( t ; x ) H ( t ) .
The quantity I L f t x is referred to as the information leakage.
This section is dedicated to answering the question: How to calculate the information leakage without knowing data distributions?

4.1. Variational Approximation of the Information Leakage

The mutual information between t and x is given as
I ( t ; x ) = H ( t ) H ( t | x )
= H ( t ) + p ( t , x ) log p ( t | x ) d t d x
= H ( t ) + log p ( t | x ) p ( t , x )
where g ( x ) p ( x ) denotes the expectation of a function of random variable g ( x ) with respect to the probability density function p ( x ) ; H ( t ) and H ( t | x ) are marginal and conditional entropies, respectively. Consider the conditional probability of t which is given as
p ( t | x ) = d θ d γ p ( θ , γ , t | x )
where θ is a set defined as in (63). Let q ( θ , γ ) be an arbitrary distribution. The log conditional probability of t can be expressed as
log ( p ( t | x ) ) = d θ d γ q ( θ , γ ) log p ( t | x )
= d θ d γ q ( θ , γ ) log p ( θ , γ , t | x ) p ( θ , γ | t , x )
= d θ d γ q ( θ , γ ) log p ( θ , γ , t | x ) q ( θ , γ ) + d θ d γ q ( θ , γ ) log q ( θ , γ ) p ( θ , γ | t , x ) .
Define
L ( q ( θ , γ ) , t , x ) : = d θ d γ q ( θ , γ ) log p ( θ , γ , t | x ) q ( θ , γ )
to express (91) as
log ( p ( t | x ) ) = L ( q ( θ , γ ) , t , x ) + KL ( q ( θ , γ ) p ( θ , γ | t , x ) )
where KL is Kullback–Leibler divergence of p ( θ , γ | t , x ) from q ( θ , γ ) . Using (87),
I ( t ; x ) = H ( t ) + L ( q ( θ , γ ) , t , x ) p ( t , x ) + KL ( q ( θ , γ ) p ( θ , γ | t , x ) ) p ( t , x ) .
That is,
I L f t x = L ( q ( θ , γ ) , t , x ) p ( t , x ) + KL ( q ( θ , γ ) p ( θ , γ | t , x ) ) p ( t , x ) .
Since Kullback–Leibler divergence is always non-zero, it follows from (95) that L p ( t , x ) provides a lower bound on I L f t x i.e.,
I L f t x L ( q ( θ , γ ) , t , x ) p ( t , x ) .
Our approach to approximate I L f t x is to maximize its lower bound with respect to variational distribution q ( θ , γ ) . That is, we seek to solve
I L ^ f t x = max q ( θ , γ ) L ( q ( θ , γ ) , t , x ) p ( t , x ) .
Result 1
(Analytical Expression for the Information Leakage). Given the model (78)–(81), I L ^ f t x is given as
I L ^ f t x = 0.5 q log ( 2 π ) + 0.5 q ϝ ( a ¯ γ ) log ( b ¯ γ ) a ¯ γ 2 b ¯ γ k = 1 q | t k G ( x ) m ¯ k | 2 p ( t , x ) a ¯ γ 2 b ¯ γ k = 1 q T r ( Λ ¯ k ) 1 ( G ( x ) ) T G ( x ) p ( x ) 1 2 k = 1 q ( m ^ k m ¯ k ) T Λ ^ k ( m ^ k m ¯ k ) + T r Λ ^ k ( Λ ¯ k ) 1 log | ( Λ ¯ k ) 1 | | ( Λ ^ k ) 1 | + q M 2 a ^ γ log b ¯ γ / b ^ γ + log Γ ( a ¯ γ ) / Γ ( a ^ γ ) ( a ¯ γ a ^ γ ) Ψ ( a ¯ γ ) + ( b ¯ γ b ^ γ ) a ¯ γ / b ¯ γ .
Here, ϝ ( · ) is the digamma function and the parameters ( Λ ¯ k , m ¯ k , a ¯ γ , b ¯ γ ) satisfy the following:
Λ ¯ k = Λ ^ k + a ¯ γ / b ¯ γ ( G ( x ) ) T G ( x ) p ( x )
m ¯ k = ( Λ ¯ k ) 1 Λ ^ k m ^ k + a ¯ γ b ¯ γ ( G ( x ) ) T t k p ( t , x )
a ¯ γ = a ^ γ + 0.5 q
b ¯ γ = b ^ γ + 1 2 k = 1 q | t k G ( x ) m ¯ k | 2 p ( t , x ) + 1 2 k = 1 q T r ( Λ ¯ k ) 1 ( G ( x ) ) T G ( x ) p ( x ) .
Proof of Result 1.
Consider
L ( q ( θ , γ ) , t , x ) = log ( p ( t | θ , γ , x ) ) q ( θ , γ ) + log p ( θ , γ ) / q ( θ , γ ) q ( θ , γ ) .
It follows from (78) and (80) that
log ( p ( t k | θ k , γ , x ) ) = 0.5 log ( 2 π ) + 0.5 log ( γ ) 0.5 γ | t k G ( x ) θ k | 2 .
Since t = t 1 t q T , we have
log ( p ( t | θ , γ , x ) ) = 0.5 q log ( 2 π ) + 0.5 q log ( γ ) 0.5 γ k = 1 q | t k G ( x ) θ k | 2 .
Using (105) and (70)–(71) in (103), we have
L ( q ( θ , γ ) , t , x ) = q 2 log ( 2 π ) + q 2 log ( γ ) q ( γ ) γ q ( γ ) 2 k = 1 q | t k G ( x ) θ k | 2 q ( θ k ) + k = 1 q log p ( θ k ; m ^ k , Λ ^ k ) q ( θ k ) q ( θ k ) + log p ( γ ; a γ , b γ ) q ( γ ) q ( γ ) .
Thus,
L ( q ( θ , γ ) , t , x ) p ( t , x ) = q 2 log ( 2 π ) + q 2 log ( γ ) q ( γ ) γ q ( γ ) 2 k = 1 q | t k | 2 p ( t ) γ q ( γ ) 2 k = 1 q ( θ k ) T ( G ( x ) ) T G ( x ) p ( x ) θ k q ( θ k ) + γ q ( γ ) k = 1 q ( θ k ) T ( G ( x ) ) T t k p ( t , x ) q ( θ k ) + k = 1 q log p ( θ k ; m ^ k , Λ ^ k ) q ( θ k ) q ( θ k ) + log p ( γ ; a γ , b γ ) q ( γ ) q ( γ ) .
Now, L ( q ( θ , γ ) , t , x ) p ( t , x ) can be maximized with respect to q ( θ k ) and q ( γ ) using variational optimization. It can be seen that optimal distributions maximizing L ( q ( θ , γ ) , t , x ) p ( t , x ) are given as
q * ( θ k ) = 1 ( 2 π ) M | ( Λ ¯ k ) 1 | exp 0.5 ( θ k m ¯ k ) T Λ ¯ k ( θ k m ¯ k )
q * ( γ ) = ( b ¯ γ ) a ¯ γ / Γ ( a ¯ γ ) ( γ ) a ¯ γ 1 exp ( b ¯ γ γ )
where the parameters ( Λ ¯ k , m ¯ k , a ¯ γ , b ¯ γ ) satisfy (99)–(102). The maximum attained value of L ( q ( θ , γ ) , t , x ) p ( t , x ) is given as
max q ( θ , γ ) L ( q ( θ , γ ) , t , x ) p ( t , x ) = 0.5 q log ( 2 π ) + 0.5 q ϝ ( a ¯ γ ) log ( b ¯ γ ) a ¯ γ 2 b ¯ γ k = 1 q | t k G ( x ) m ¯ k | 2 p ( t , x ) a ¯ γ 2 b ¯ γ k = 1 q T r ( Λ ¯ k ) 1 ( G ( x ) ) T G ( x ) p ( x ) k = 1 q KL ( q * ( θ k ) p ( θ k ; m ^ k , Λ ^ k ) ) KL ( q * ( γ ) p ( γ ; a ^ γ , b ^ γ ) )
where ϝ ( · ) is the digamma function. After substituting the maximum value in (97) and calculating Kullback–Leibler divergences, we obtain (98).    □

4.2. An Algorithm for the Computing of Information Leakage

Result 1 forms the basis of Algorithm 8 that computes the information leakage using available data samples.
Algorithm 8 Estimation of information leakage, I L f t x = I ( t ; x ) H ( t ) , using variational approximation
Require: 
Dataset ( x i R n , t i R q ) | x i = f t x ( t i ) , i { 1 , , N } .
1:
Apply Algorithm 7 on ( x i , t i ) | i { 1 , , N } with M m a x = min ( N / 2 , 1000 ) (i.e., constraining the maximum possible number of auxiliary points M m a x below 1000 for computational efficiency) to obtain variational membership-mappings Bayesian model BM x t = { { m ^ k , Λ ^ k | k { 1 , , q } } , a ^ γ , b ^ γ } .
2:
Initialize a ¯ / b ¯ , e.g., as a ¯ / b ¯ = a ^ / b ^ .
3:
repeat
4:
    Update { Λ ¯ k , m ¯ k | k { 1 , , q } } , a ¯ , b ¯ using (99)-(102) where expectations < · > p ( x ) and < · > p ( t , x ) are approximated via sample averages.
5:
until convergence.
6:
Compute I L ^ f t x using (98) where expectations < · > p ( x ) and < · > p ( t , x ) are approximated via sample averages.
7:
return  I L ^ f t x and the model BM x t .
The functionality of Algorithm 8 is as follows:
  • Step 1 applies Algorithm 7 for the inference of a variational membership-mappings Bayesian model.
  • The loop between step 3 and step 5 recursively estimates the parameters ( { Λ ¯ k , m ¯ k | k { 1 , , q } } , a ¯ , b ¯ ) using update rules (99)–(102).
  • Step 6 computes the information leakage using (98).
Remark 4
(Computational Complexity). The computational complexity of Algorithm 8 is asymptotically dominated by the computation of inverse of M × M dimensional matrix Λ ¯ k in (100) to calculate m ¯ k . Thus, the computational complexity of Algorithm 8 is given as O ( M 3 ) , where M is the number of auxiliary points.
Example 1
(Verification of Information Leakage Estimation Algorithm). To demonstrate the effectiveness of Algorithm 8 in estimating information leakage, a scenario is generated where t R 10 and x R 10 are Gaussian distributed such that x = t + ω ; t N ( 0 , 5 I 10 ) ; ω N ( 0 , σ I 10 ) with σ [ 1 , 15 ] . Since the data distributions in this scenario are known, the information leakage can be theoretically calculated and is given as
I L f t x = 5 log ( 1 + 5 / σ ) 0.5 log | ( 2 π e 5 I 10 ) | .
For a given value of σ, 1000 samples of t and x were simulated and Algorithm 8 was applied for estimating information leakage. The experiments were carried out at different values of σ ranging from 1 to 15.
Figure 3 compares the plots of estimated and theoretically calculated values of information leakage against σ. A close agreement between the two plots in Figure 3 verifies the effectiveness of Algorithm 8 in estimating information leakage without knowing the data distributions.

5. Information Theoretic Measures for Privacy Leakage, Interpretability, and Transferability

5.1. Definitions

To define formally the information theoretic measures for privacy leakage, interpretability, and transferability; a few variables and mappings are introduced in Table 2. Definitions 10–12 provide the mathematical definitions of the information theoretic measures.
Definition 10
(Privacy Leakage). Privacy leakage (by the mapping from private variables to noise-added data vector) is a measure of the amount of information about private/sensitive variable x s r leaked by the mapping f x s r y s r + and is defined as
I L f x s r y s r + : = I x s r ; f x s r y s r + ( x s r ) H ( x s r )
= I x s r ; y s r + H ( x s r ) .
Definition 11
(Interpretability Measure). Interpretability (of noise-added data vector) is measured as the amount of information about interpretable parameters t s r leaked by the mapping f t s r y s r + and is defined as
I L f t s r y s r + : = I t s r ; f t s r y s r + ( t s r ) H ( t s r )
= I t s r ; y s r + H ( t s r ) .
Definition 12
(Transferability Measure). Transferability (from source domain data representation learning models (i.e., P 1 + s r , , P C + s r ) to the target domain data representation learning models (i.e., P 1 t g , , P C t g )) is measured as the amount of information about source domain feature vector y ^ t g s r leaked by the mapping f y ^ t g s r y ^ t g t g and is defined as
I L f y ^ t g s r y ^ t g t g : = I y ^ t g s r ; f y ^ t g s r y ^ t g t g ( y ^ t g s r ) H ( y ^ t g s r )
= I y ^ t g s r ; y ^ t g t g H ( y ^ t g s r ) .
Here, y ^ t g t g represents the target domain feature vector and f y ^ t g s r y ^ t g t g : R p s r R p s r is the mapping from source domain feature vector y ^ t g s r to target domain feature vector y ^ t g t g .
Since the defined measures are in the form of information leakages, Algorithm 8 could be directly applied for practically computing the measures provided the availability of data samples.

5.2. A Unified Approach to Privacy-Preserving Interpretable and Transferable Learning

The presented theory allows us to develop an algorithm that implements privacy-preserving interpretable and transferable learning methodology in a unified manner.
Algorithm 9 is presented for a systematic implementation of the proposed privacy-preserving interpretable and transferable deep learning methodology. The functionality of Algorithm 9 is as follows:
Algorithm 9 Algorithm for privacy-preserving interpretable and transferable learning
Require: 
The labeled source dataset: Y s r = { Y c s r } c = 1 C (where Y c s r = { y s r i , c R p s r | i { 1 , , N c s r } } represents the c-th labeled samples); the set of private data: X s r = { X c s r } c = 1 C (where X c s r = { x s r R n s r | x s r = f x s r y s r 1 ( y s r ) , y s r Y c s r } ); the set of interpretable parameters: T s r = { T c s r } c = 1 C (where T c s r = { t s r R q | t s r = f t s r y s r 1 ( y s r ) , y s r Y c s r } ); the set of a few labeled target samples: { Y c t g } c = 1 C (where Y c t g = { y t g i , c R p t g | i { 1 , , N c t g } } is the set of c-th labeled target samples); the set of unlabeled target samples: Y * t g = { y t g i , * R p t g | i { 1 , , N * t g } } ; and the differential privacy parameters: d R + , ϵ R + , δ ( 0 , 1 ) .
  1:
A differentially private approximation of source dataset, Y + s r = { Y c + s r } c = 1 C , is obtained using Algorithm 5 on Y s r .
  2:
Differentially private source domain classifier, { P c + s r } c = 1 C , is built using Algorithm 6 on Y + s r taking subspace dimension as equal to min ( 20 , p s r ) (where p s r is the dimension of source data samples), ratio r m a x as equal to 0.5, and number of layers as equal to 5.
  3:
Taking subspace dimension n s t = min ( p s r / 2 , p t g ) , the source domain transformation matrix V + s r R n s t × p s r is defined as with its i-th row equal to the transpose of the eigenvector corresponding to the i-th largest eigenvalue of sample covariance matrix computed on differentially private approximated source samples. The target domain transformation matrix V t g R n s t × p t g is defined as with its i-th row equal to the transpose of the eigenvector corresponding to the i-th largest eigenvalue of the sample covariance matrix computed on target samples.
  4:
For the case of heterogenous source and target domains, the subspace alignment approach is used to transform target samples via (40) and () for defining the sets { Y c t g s r } c = 1 C and Y * t g s r .
  5:
Initial target domain classifier, { P c t g | 0 } c = 1 C , is built using Algorithm 4 on labeled target samples, { Y c t g s r } c = 1 C , taking subspace dimension as equal to min ( 20 , min 1 c C { N c t g } 1 ) (where N c t g is the number of c-th class labeled target samples), ratio r m a x as equal to 1, and number of layers as equal to 1.
  6:
The target domain classifier is updated using (42) and (43) until 4 iterations taking the monotonically non-decreasing subspace dimension n sequence as { min ( 5 , p s r ) , min ( 10 , p s r ) , min ( 15 , p s r ) , min ( 20 , p s r ) } and r m a x = 0.5 .
  7:
The mapping from source to target domain is learned by means of a model, M s r t g , defined as in (44).
  8:
Compute privacy leakage, I L f x s r y s r + , and adversary model, BM y s r + x s r , via applying Algorithm 8 on { ( y s r + , x s r ) | y s r + = f x s r y s r + ( x s r ) , x s r X s r , y s r + Y + s r } .
  9:
Compute interpretability measure, I L f t s r y s r + , and interpretability model, BM y s r + t s r , via applying Algorithm 8 on { ( y s r + , t s r ) | y s r + = f t s r y s r + ( t s r ) , t s r T s r , y s r + Y + s r } .
10:
Compute transferability measure, I L f y ^ t g s r y ^ t g t g , via applying Algorithm 8 on y ^ t g t g ( y t g ) , y ^ t g s r ( y t g ) | y t g { Y c t g } c = 1 C Y * t g , where
y ^ t g s r ( y t g ) = WD ^ y t g s r ( y t g ) ; P f y t g c ( y t g ) + s r
y ^ t g t g ( y t g ) = WD ^ y t g s r ( y t g ) ; P f y t g c ( y t g ) t g
f y t g c ( y t g ) = c ^ y t g s r ( y t g ) ; { P c t g } c = 1 C , { P c + s r } c = 1 C , M s r t g ,
y t g s r ( y t g ) is defined as in (39), and c ^ ( · ) is defined by (47).
11:
return in the source domain: classifier { P c + s r } c = 1 C ; privacy leakage I L f x s r y s r + and adversary model BM y s r + x s r ; interpretability measure I L f t s r y s r + and interpretability model BM y s r + t s r .
12:
return in the target domain: classifier { P c t g } c = 1 C .
13:
return for transfer and multi-task learning scenario: classifiers { P c + s r } c = 1 C and { P c t g } c = 1 C ; source2target model M s r t g ; latent subspace transformation matrices V + s r and V t g ; transferability measure I L f y ^ t g s r y ^ t g t g .
  • Step 2 builds the differentially private source domain classifier following Algorithm 6 from previous work [22].
  • Step 6 results in the building of the target domain classifier using the method of [22].
  • An information theoretic evaluation of privacy leakage, interpretability, and transferability is undertaken at step 8, 9, and 10, respectively.
  • Step 8 also provides the adversary model BM y s r + x s r , which can be used to estimate private data and thus to simulate privacy attacks;
  • Step 9 also provides the interpretability model BM y s r + t s r , that can be used to estimate interpretable parameters and thus provide an interpretation to the non-interpretable data vectors.

6. Experiments

Experiments have been carried out to demonstrate the application of the proposed measures (for privacy leakage, interpretability, and transferability) to privacy-preserving interpretable and transferable learning. The methodology was implemented using MATLAB R2017b and the experiments have been made on an iMac (M1, 2021) machine with 8 GB RAM.

6.1. MNIST Dataset

The MNIST dataset contains 28 × 28 sized images divided into a training set of 60,000 images and test set of 10,000 images. The images’ pixel values were divided by 255 to normalize the values in the range from 0 to 1. The 28 × 28 normalized pixel values of each image were flattened to an equivalent 784-dimensional data vector.

6.1.1. Interpretable Parameters

For an MNIST digits dataset, there exist no additional interpretable parameters other than the pixel values. Thus, we defined corresponding to a pixel values vector y [ 0 , 1 ] 784 an interpretable parameter vector t { 0 , 1 } 10 such that the j-th element t j = 1 , if the j-th class label is associated to y; otherwise, t j = 0 . That is, interpretable vector t, in our experimental setting, represents the class label assigned to data vector y.

6.1.2. Private Data

Here, we assume that pixel values are private, i.e., x s r = y s r .

6.1.3. Semi-Supervised Transfer Learning Scenario

A transfer learning scenario was considered in the same setting as in [22,30] where 60,000 training samples constituted the source dataset; a set of 9000 test samples constituted the target dataset, and the classification performance was evaluated on the remaining 1000 test samples. Out of 9000 target samples, only 10 samples per class were labeled and the remaining 8900 target samples remained as unlabeled.

6.1.4. Experimental Design

Algorithm 9 is applied with the differential privacy parameters as d = 1 and δ = 1 × 10 5 . The experiment involves six different privacy-preserving semi-supervised transfer learning scenarios with privacy-loss bound values as ϵ = 0.1 , ϵ = 0.25 , ϵ = 0.5 , ϵ = 1 , ϵ = 2 , and ϵ = 10 . For the computation of privacy leakage, interpretability measure, and transferability measure in Algorithm 9, a subset consisting of 5000 randomly selected samples was considered.

6.1.5. Results

The experimental results have been plotted in Figure 4. Figure 4a–c display the privacy–accuracy trade-off curve, privacy–interpretability trade-off curve, and privacy–transferability trade-off curve respectively. The following observations are made:
  • As expected and observed in Figure 4f, the transferability measure is positively correlated with the accuracy of source–domain classifier on target test samples.
  • Since we have defined the interpretable vector associated to a feature vector as representing the class label, the positive correlations of interpretability measure with the source domain classifier’s accuracy and the transferability measure are observed in Figure 4e and Figure 4f, respectively.
  • The results also verify the robust performance of Algorithm 9 under transfer and multi-task learning scenario, since the classification performance in the transfer and multi-task learning scenario, unlike the performance of the source domain classifier, is not adversely affected by a reduction in privacy leakage, interpretability measure, and transferability measure as observed in Figure 4a,e,f.
Table 3 reports the results obtained by the models that correspond to minimum privacy leakage, maximum interpretability measure, and maximum transferability measure. The robustness of transfer and multi-task learning scenario is further highlighted in Table 3. To achieve the minimum value of privacy leakage, the accuracy of source domain classifier must be decreased to 0.1760; however, the transfer and multi-task learning scenario achieves the minimum privacy leakage value with the accuracy of 0.9510. As observed in Table 3, the maximum transferability-measure models also correspond to the maximum interpretability-measure models.
As a visualization example, Figure 5 displays noise-added data samples for different values of information theoretic measures.

6.2. Office and Caltech256 Datasets

The “Office+Caltech256” dataset has 10 common categories of both Office and Caltech256 datasets. The dataset has four domains: amazon, webcam, dslr, and caltech256. This dataset has been widely used [31,32,33,34] for evaluating multi-class accuracy performance in a standard domain adaptation setting with a small number of labeled target samples. Following [32], the 4096-dimensional deep-net VGG-FC6 features are extracted from the images. However, for the learning of classifiers, the 4096-dimensional feature vectors are reduced to 100-dimensional feature vectors using principal components computed from the data of amazon domain. Thus, corresponding to each image, a 100-dimensional data vector is constructed.

6.2.1. Interpretable Parameters

Corresponding to a data vector y R 100 , an interpretable parameter vector t { 0 , 1 } 10 is defined such that the j-th element t j = 1 , if the j-th class label is associated to y; otherwise, t j = 0 . That is, interpretable vector t, in our experimental setting, represents the class-label assigned to data vector y.

6.2.2. Private Data

Here, we assume that the extracted image feature vectors are private, i.e., x s r = y s r .

6.2.3. Semi-Supervised Transfer Learning Scenario

Similarly to [31,32,33,34], the experimental setup is follows:
  • The number of training samples per class in the source domain is 20 for amazon and is 8 for the other three domains;
  • The number of labeled samples per class in the target domain is 3 for all the four domains.

6.2.4. Experimental Design

Taking a domain as the source and another domain as the target, 12 different transfer learning experiments are performed on the four domains associated to the “Office+Caltech256” dataset. Each of the 12 experiments is repeated 20 times via creating 20 random train/test splits. In all of the 240 ( = 12 × 20 ) experiments, Algorithm 9 is applied three times with varying values of privacy-loss bound: first with differential privacy parameters as ( d = 1 , ϵ = 0.01 , δ = 1 × 10 5 ) , second with differential privacy parameters as ( d = 1 , ϵ = 0.1 , δ = 1 × 10 5 ) , and third with differential privacy parameters as ( d = 1 , ϵ = 1 , δ = 1 × 10 5 ) . As Algorithm 9 with different values of privacy-loss bound ϵ will result in different models, the transfer and multi-task learning models that correspond to the maximum interpretability measure and maximum transferability measure are considered for an evaluation.

6.2.5. Reference Methods

This dataset has been studied previously [31,32,33,34,35,36] and thus, as a reference, the performances of the following existing methods were considered:
  • ILS (1-NN) [32]: This method learns an Invariant Latent Space (ILS) to reduce the discrepancy between domains and uses Riemannian optimization techniques to match statistical properties between samples projected into the latent space from different domains.
  • CDLS [35]: The Cross-Domain Landmark Selection (CDLS) method derives a domain-invariant feature subspace for heterogeneous domain adaptation.
  • MMDT [34]: The Maximum Margin Domain Transform (MMDT) method adapts max-margin classifiers in a multi-class manner by learning a shared component of the domain shift as captured by the feature transformation.
  • HFA [36]: The Heterogeneous Feature Augmentation (HFA) method learns common latent subspace and a classifier under the max-margin framework.
  • OBTL [33]: The Optimal Bayesian Transfer Learning (OBTL) method employs a Bayesian framework to transfer learning through the modeling of a joint prior probability density function for feature-label distributions of the source and target domains.

6.2.6. Results

Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11, Table 12, Table 13, Table 14 and Table 15 report the results, and the first two best performances have been marked.
Finally, Table 16 summarizes the overall performance of the top four methods. As observed in Table 16, the maximum transferability-measure model remains as best performing in the maximum number of experiments. The most remarkable result observed is that the proposed methodology, despite being privacy-preserving, ensuring the differential privacy-loss bound to be less than equal to 1 and not requiring access to source data samples, performs better than even the non-private methods.

6.3. An Application Example: Mental Stress Detection

The mental stress detection problem is considered as an application example of the proposed privacy-preserving interpretable and transferable learning approach. The dataset from [17], consisting of heart rate interval measurements of different subjects, is considered for the study of an individual stress detection problem. In [17], a membership-mappings-based interpretable deep model was applied for an estimation of stress score; however, the current study deals with application of the proposed privacy-preserving interpretable and transferable deep learning method to solve the stress classification problem. The problem is concerned with the detection of stress in an individual based on the analysis of recorded sequence of R-R intervals, { R R i } i . The R-R data vector at i - th time-index, y i , is defined as
y i = R R i R R i 1 R R i d T .
That is, the current interval and history of previous d intervals constitute the data vector. Assuming an average heartbeat of 72 beats per minute, d is chosen as equal to 72 × 3 = 216 so that the R-R data vector consists of on average 3-minute-long R-R intervals sequences. A dataset, say { y i } i , is built via (1) preprocessing the R-R interval sequence { R R i } i with an impulse rejection filter [37] for artifacts detection and (2) excluding the R-R data vectors containing artifacts from the dataset. The dataset contains the stress score on a scale from 0 to 100. A label of either “no-stress” or “under-stress” is assigned to each y i based on the stress score. Thus, we have a binary classification problem.

6.3.1. Interpretable Parameters

Corresponding to a R-R data vector, there exists the set of interpretable parameters: mental demand, physical demand, temporal demand, own performance, effort, and frustration. These are the six components of stress acquired using the NASA Task Load Index [38]. The NASA Task Load Index provides subjective assessment of stress where an individual provides a rating on the scale from 0 to 100 for each of the six components of stress (mental demand, physical demand, temporal demand, own performance, effort, and frustration). Thus, corresponding to each 217 dimensional R-R data vector, there exists a six-dimensional interpretable parameters vector acquired using the NASA Task Load Index.

6.3.2. Private Data

Here, we assume that heart rate values are private. As instantaneous heart rate is given as H R i = 60 / R R i ; thus, information about private data is directly contained in the R-R data vectors.

6.3.3. Semi-Supervised Transfer Learning Scenario

Out of the total subjects, a randomly chosen subject’s data serve as the source domain data. Considering every other subject’s data as the target domain data, the transfer learning experiment is performed independently on each target subject where 50% of the target subject’s samples are labeled, and the remaining unlabeled target samples also serve as test data for evaluating the classification performance. However, only the target subjects, with data containing both the classes and at least 60 samples, were considered for experimentation. There are in total 48 such target subjects.

6.3.4. Experimental Design

Algorithm 9 is applied with d = 1 , ϵ { 0.1 , 0.5 , 1 , 2 , 5 , 8 , 20 , 50 , 100 , } , and δ = 1 × 10 5 . Each of the 48 experiments involves 10 different privacy-preserving semi-supervised transfer learning scenarios with privacy-loss bound values as ϵ = 0.1 , ϵ = 0.5 , ϵ = 1 , ϵ = 2 , ϵ = 5 , ϵ = 8 , ϵ = 20 , ϵ = 50 , ϵ = 100 , and ϵ = . The following two requirements are associated to this application example:
  • The private source domain data must be protected while transferring knowledge from source to target domain; and
  • The interpretability of the source domain model should be high.
In view of the aforementioned requirements, the models that correspond to the minimum privacy leakage and maximum interpretability measure amongst all the models obtained corresponding to 10 different choices of differential privacy-loss bound ϵ are considered for detecting stress.

6.3.5. Results

Figure 6 summarizes the experimental results where accuracies obtained by both minimum privacy-leakage models and maximum interpretability-measure models have been displayed as box plots.
It is observed in Figure 6 that the transfer and multi-task learning improves considerably the performance of source domain classifier. Table 17 reports the median values (of privacy leakage, interpretability measure, transferability measure, and classification accuracy) obtained in the experiments on 48 different subjects. The robust performance of transfer and multi-task learning scenario is further observed in Table 17.
As a visualization example, Figure 7 displays the noise-added source domain heart rate interval data for different values of information theoretic measures.

7. Concluding Remarks

The paper has introduced information theoretic measures for privacy leakage, interpretability, and transferability to study the trade-offs. This is the first study to develop an information theory-based unified approach to privacy-preserving interpretable and transferable learning. The experiments have verified that the proposed measures (for privacy leakage, interpretability, and transferability) can be used to study the trade-off curves (between privacy leakage, interpretability measure, and transferability measure) and thus to optimize the models for the given application requirements such as the requirement of minimum privacy leakage, the requirement of maximum interpretability measure, and the requirement of maximum transferability measure. The experimental results on the MNIST dataset showed that the transfer and multi-task learning scenario improved remarkable the accuracy from 0.1760 to 0.9510 while ensuring the minimum privacy leakage. The experiments on Office and Caltech256 datasets indicated that the proposed methodology, despite ensuring differential privacy-loss bound to be less than equal to 1 and not requiring an access to source data samples, performed better than even existing non-private methods in six out of 12 transfer learning experiments. The stress detection experiments on a real-world biomedical data led to the observation that the transfer and multi-task learning scenario improved the accuracy from 0.3411 to 0.9647 (while ensuring the minimum privacy leakage) and from 0.3602 to 0.9619 (while ensuring the maximum interpretability measure). The considered unified approach to privacy-preserving interpretable and transferable learning involves membership-mappings-based conditionally deep autoencoders, albeit other data representation learning models could be explored. The future work includes the following:
  • Although the text has not focused on federated learning, the transfer learning approach could be easily extended to the multi-party system and the transferability measure could be calculated for any pair of parties.
  • Also, the explainability of the conditionally deep autoencoders follows, similar to in [17], via estimating interpretable parameters from non-interpretable data feature vectors using variational membership-mapping Bayesian model.
  • Furthermore, the variational membership-mapping Bayesian model quantifies uncertainties on the estimation of parameters (of interest), which is also important for a user’s trust on the model.

Author Contributions

Conceptualization, M.K.; methodology, M.K.; writing—original draft preparation, M.K.; writing—review and editing, L.F.; project administration, B.F.; funding acquisition, B.A.M. and L.F. All authors have read and agreed to the published version of the manuscript.

Funding

The research reported in this paper has been supported by the Austrian Research Promotion Agency (FFG) COMET-Modul S3AI (Security and Safety for Shared Artificial Intelligence); FFG Sub-Project PETAI (Privacy Secured Explainable and Transferable AI for Healthcare Systems); FFG Grant SMiLe (Secure Machine Learning Applications with Homomorphically Encrypted Data); FFG Grant PRIMAL (Privacy-Preserving Machine Learning for Industrial Applications); and the Austrian Ministry for Transport, Innovation and Technology, the Federal Ministry for Digital and Economic Affairs, and the State of Upper Austria in the frame of the SCCH competence center INTEGRATE [(FFG grant no. 892418)] part of the FFG COMET Competence Centers for Excellent Technologies Programme.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
TAITrustworthy Artificial Intelligence

References

  1. High-Level Expert Group on AI. Ethics Guidelines for Trustworthy AI; Report; European Commission: Brussels, Belgium, 2019. [Google Scholar]
  2. Floridi, L. Establishing the rules for building trustworthy AI. Nat. Mach. Intell. 2019, 1, 261–262. [Google Scholar] [CrossRef]
  3. Floridi, L.; Cowls, J. A Unified Framework of Five Principles for AI in Society. Harv. Data Sci. Rev. 2019, 1. [Google Scholar] [CrossRef]
  4. Floridi, L.; Cowls, J.; Beltrametti, M.; Chatila, R.; Chazerand, P.; Dignum, V.; Luetge, C.; Madelin, R.; Pagallo, U.; Rossi, F.; et al. AI4People—An Ethical Framework for a Good AI Society: Opportunities, Risks, Principles, and Recommendations. Minds Mach. 2018, 28, 689–707. [Google Scholar] [CrossRef]
  5. Mcknight, D.H.; Carter, M.; Thatcher, J.B.; Clay, P.F. Trust in a Specific Technology: An Investigation of Its Components and Measures. ACM Trans. Manag. Inf. Syst. 2011, 2, 1–25. [Google Scholar] [CrossRef]
  6. Thiebes, S.; Lins, S.; Sunyaev, A. Trustworthy artificial intelligence. Electron. Mark. 2020, 31, 447–464. [Google Scholar] [CrossRef]
  7. Future of Life Institute. Asilomar AI Princples. 2017. Available online: https://futureoflife.org/ai-principles/ (accessed on 19 September 2023).
  8. Université de Montréal. Montreal Declaration for a Responsible Development of AI. 2017. Available online: https://www.montrealdeclaration-responsibleai.com/the-declaration/ (accessed on 19 September 2023).
  9. UK House of Lords. AI in the UK: Ready, Willing and Able? 2017. Available online: https://publications.parliament.uk/pa/ld201719/ldselect/ldai/100/10002.htm (accessed on 19 September 2023).
  10. OECD. OECD Principles on AI. 2019. Available online: https://www.oecd.org/going-digital/ai/principles/ (accessed on 19 September 2023).
  11. Chinese National Governance Committee for the New Generation Artificial Intelligence. Governance Principles for the New Generation Artificial Intelligence–Developing Responsible Artificial Intelligence. 2019. Available online: https://www.chinadaily.com.cn/a/201906/17/WS5d07486ba3103dbf14328ab7.html (accessed on 19 September 2023).
  12. Vought, R.T. Guidance for Regulation of Artificial Intelligence Applications. 2020. Available online: https://www.whitehouse.gov/wp-content/uploads/2020/01/Draft-OMB-Memo-on-Regulation-of-AI-1-7-19.pdf (accessed on 19 September 2023).
  13. Hagendorff, T. The Ethics of AI Ethics: An Evaluation of Guidelines. Minds Mach. 2020, 30, 99–120. [Google Scholar] [CrossRef]
  14. Kumar, M.; Moser, B.; Fischer, L.; Freudenthaler, B. Membership-Mappings for Data Representation Learning: Measure Theoretic Conceptualization. In Proceedings of the Database and Expert Systems Applications—DEXA 2021 Workshops; Kotsis, G., Tjoa, A.M., Khalil, I., Moser, B., Mashkoor, A., Sametinger, J., Fensel, A., Martinez-Gil, J., Fischer, L., Czech, G., et al., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 127–137. [Google Scholar]
  15. Kumar, M.; Moser, B.; Fischer, L.; Freudenthaler, B. Membership-Mappings for Data Representation Learning: A Bregman Divergence Based Conditionally Deep Autoencoder. In Proceedings of the Database and Expert Systems Applications—DEXA 2021 Workshops; Kotsis, G., Tjoa, A.M., Khalil, I., Moser, B., Mashkoor, A., Sametinger, J., Fensel, A., Martinez-Gil, J., Fischer, L., Czech, G., et al., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 138–147. [Google Scholar]
  16. Kumar, M.; Freudenthaler, B. Fuzzy Membership Functional Analysis for Nonparametric Deep Models of Image Features. IEEE Trans. Fuzzy Syst. 2020, 28, 3345–3359. [Google Scholar] [CrossRef]
  17. Kumar, M.; Zhang, W.; Weippert, M.; Freudenthaler, B. An Explainable Fuzzy Theoretic Nonparametric Deep Model for Stress Assessment Using Heartbeat Intervals Analysis. IEEE Trans. Fuzzy Syst. 2021, 29, 3873–3886. [Google Scholar] [CrossRef]
  18. Kumar, M.; Singh, S.; Freudenthaler, B. Gaussian fuzzy theoretic analysis for variational learning of nested compositions. Int. J. Approx. Reason. 2021, 131, 1–29. [Google Scholar] [CrossRef]
  19. Zhang, W.; Kumar, M.; Ding, W.; Li, X.; Yu, J. Variational learning of deep fuzzy theoretic nonparametric model. Neurocomputing 2022, 506, 128–145. [Google Scholar] [CrossRef]
  20. Kumar, M.; Zhang, W.; Fischer, L.; Freudenthaler, B. Membership-Mappings for Practical Secure Distributed Deep Learning. IEEE Trans. Fuzzy Syst. 2023, 31, 2617–2631. [Google Scholar] [CrossRef]
  21. Zhang, Q.; Yang, J.; Zhang, W.; Kumar, M.; Liu, J.; Liu, J.; Li, X. Deep fuzzy mapping nonparametric model for real-time demand estimation in water distribution systems: A new perspective. Water Res. 2023, 241, 120145. [Google Scholar] [CrossRef] [PubMed]
  22. Kumar, M. Differentially private transferrable deep learning with membership-mappings. Adv. Comput. Intell. 2023, 3, 1. [Google Scholar] [CrossRef]
  23. Kumar, M.; Stoll, N.; Stoll, R. Variational Bayes for a Mixed Stochastic/Deterministic Fuzzy Filter. IEEE Trans. Fuzzy Syst. 2010, 18, 787–801. [Google Scholar] [CrossRef]
  24. Kumar, M.; Stoll, N.; Stoll, R.; Thurow, K. A Stochastic Framework for Robust Fuzzy Filtering and Analysis of Signals-Part I. IEEE Trans. Cybern. 2016, 46, 1118–1131. [Google Scholar] [CrossRef]
  25. Kumar, M.; Stoll, N.; Stoll, R. Stationary Fuzzy Fokker-Planck Learning and Stochastic Fuzzy Filtering. IEEE Trans. Fuzzy Syst. 2011, 19, 873–889. [Google Scholar] [CrossRef]
  26. Kumar, M.; Neubert, S.; Behrendt, S.; Rieger, A.; Weippert, M.; Stoll, N.; Thurow, K.; Stoll, R. Stress Monitoring Based on Stochastic Fuzzy Analysis of Heartbeat Intervals. IEEE Trans. Fuzzy Syst. 2012, 20, 746–759. [Google Scholar] [CrossRef]
  27. Kumar, M.; Insan, A.; Stoll, N.; Thurow, K.; Stoll, R. Stochastic Fuzzy Modeling for Ear Imaging Based Child Identification. IEEE Trans. Syst. Man Cybern. Syst. 2016, 46, 1265–1278. [Google Scholar] [CrossRef]
  28. Kumar, M.; Rossbory, M.; Moser, B.A.; Freudenthaler, B. An optimal (ϵ, δ)—Differentially private learning of distributed deep fuzzy models. Inf. Sci. 2021, 546, 87–120. [Google Scholar] [CrossRef]
  29. Kumar, M.; Brunner, D.; Moser, B.A.; Freudenthaler, B. Variational Optimization of Informational Privacy. In Proceedings of the Database and Expert Systems Applications; Kotsis, G., Tjoa, A.M., Khalil, I., Fischer, L., Moser, B., Mashkoor, A., Sametinger, J., Fensel, A., Martinez-Gil, J., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 32–47. [Google Scholar]
  30. Papernot, N.; Abadi, M.; Erlingsson, U.; Goodfellow, I.J.; Talwar, K. Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data. In Proceedings of the ICLR, Toulon, France, 24–26 April 2017. [Google Scholar]
  31. Hoffman, J.; Rodner, E.; Donahue, J.; Saenko, K.; Darrell, T. Efficient Learning of Domain-invariant Image Representations. arXiv 2013, arXiv:1301.3224. [Google Scholar]
  32. Herath, S.; Harandi, M.; Porikli, F. Learning an Invariant Hilbert Space for Domain Adaptation. In Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  33. Karbalayghareh, A.; Qian, X.; Dougherty, E.R. Optimal Bayesian Transfer Learning. IEEE Trans. Signal Process. 2018, 66, 3724–3739. [Google Scholar] [CrossRef]
  34. Hoffman, J.; Rodner, E.; Donahue, J.; Kulis, B.; Saenko, K. Asymmetric and Category Invariant Feature Transformations for Domain Adaptation. Int. J. Comput. Vis. 2014, 109, 28–41. [Google Scholar] [CrossRef]
  35. Tsai, Y.H.; Yeh, Y.; Wang, Y.F. Learning Cross-Domain Landmarks for Heterogeneous Domain Adaptation. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5081–5090. [Google Scholar]
  36. Li, W.; Duan, L.; Xu, D.; Tsang, I.W. Learning with Augmented Features for Supervised and Semi-Supervised Heterogeneous Domain Adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1134–1148. [Google Scholar] [CrossRef] [PubMed]
  37. McNames, J.; Thong, T.; Aboy, M. Impulse rejection filter for artifact removal in spectral analysis of biomedical signals. In Proceedings of the The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Francisco, CA, USA, 1–5 September 2004; Volume 1, pp. 145–148. [Google Scholar] [CrossRef]
  38. Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Hum. Ment. Workload. 1988, 1, 139–183. [Google Scholar]
Figure 1. An information theoretic unified approach to “privacy-preserving interpretable and transferable learning” for studying the privacy–interpretability–transferability trade-offs while addressing beneficence, non-maleficence, autonomy, justice, and explicability principles of TAI.
Figure 1. An information theoretic unified approach to “privacy-preserving interpretable and transferable learning” for studying the privacy–interpretability–transferability trade-offs while addressing beneficence, non-maleficence, autonomy, justice, and explicability principles of TAI.
Algorithms 16 00450 g001
Figure 2. The proposed methodology to evaluate privacy leakage, interpretability, and transferability in terms of the information leakages.
Figure 2. The proposed methodology to evaluate privacy leakage, interpretability, and transferability in terms of the information leakages.
Algorithms 16 00450 g002
Figure 3. A comparison of the estimated information leakage values with the theoretically calculated values.
Figure 3. A comparison of the estimated information leakage values with the theoretically calculated values.
Algorithms 16 00450 g003
Figure 4. The plots between privacy leakage, interpretability measure, transferability measure, and accuracy for MNIST dataset.
Figure 4. The plots between privacy leakage, interpretability measure, transferability measure, and accuracy for MNIST dataset.
Algorithms 16 00450 g004
Figure 5. An example of a source domain sample corresponding to different levels of privacy leakage, interpretability measure, and transferability measure.
Figure 5. An example of a source domain sample corresponding to different levels of privacy leakage, interpretability measure, and transferability measure.
Algorithms 16 00450 g005
Figure 6. The box plots of accuracies obtained in detecting mental stress on 48 different subjects.
Figure 6. The box plots of accuracies obtained in detecting mental stress on 48 different subjects.
Algorithms 16 00450 g006
Figure 7. A display of source domain R-R interval data corresponding to different levels of privacy leakage, interpretability measure, and transferability measure.
Figure 7. A display of source domain R-R interval data corresponding to different levels of privacy leakage, interpretability measure, and transferability measure.
Algorithms 16 00450 g007
Table 1. Core issues with TAI principles and solution approaches.
Table 1. Core issues with TAI principles and solution approaches.
TAI PrincipleIssueSolution Approach
BeneficenceI1: non-availability of large high-quality training data transfer learning
I2: models (intellectual properties) are not widely available federated learning
Non-maleficenceI3: leakage of private information embedded in training data privacy-preserving data release mechanism
I4: leakage of private information embedded in model parameters and model outputs privacy-preserving machine and deep learning
AutonomyI5: user s inability to quantify model uncertainties lead to indecisiveness regarding the level of autonomy given to AI system analytical quantification of model uncertainties
JusticeI6: bias of training data toward certain groups of people leads to discrimination federated learning
ExplicabilityI7: user s inability to understand model functionality leads to mistrust and obstruction in establishing accountability interpretable machine and deep learning models
Table 2. Introduced variables and mappings.
Table 2. Introduced variables and mappings.
Symbol/MappingDefinition/Meaning
x s r R n s r vector representing private / sensitive variables associated to source domain
y s r R p s r source domain data vector
t s r R q vector representing the set of interpretable parameters associated to non-interpretable data vector   y s r
y s r + R p s r noise-added data vector (that is either publicly released or used for the training of - ( source model) obtained from   y s r   via Algorithm   5
f x s r y s r + : R n s r R p s r mapping from private variables to noise-added data vector , i . e . , y s r + = f x s r y s r + ( x s r )
f t s r y s r + : R q R p s r mapping from interpretable parameters to noise-added data vector , i . e . , y s r + = f t s r y s r + ( t s r )
{ P c + s r } c = 1 C differentially private source domain autoencoders, representing data features of each of   C classes , obtained via Algorithm   6
y t g R p t g target domain data vector
y t g s r R p s r representation of target domain data vector   y t g   in source domain via transformation ( 39 )
{ P c t g } c = 1 C target domain autoencoders , representing data features of each of   C classes , obtained via Algorithm   6
f y t g c : R p t g { 1 , , C } mapping assigning class label to target domain data vector   y t g   via ( 47 ) , i . e . , f y t g c ( y t g ) = c ^ y t g s r ( y t g ) ; { P c t g } c = 1 C , { P c + s r } c = 1 C , M s r t g
y ^ t g s r R p s r transformation of   y t g   to source domain and filtering through the autoencoder that represents the source domain feature vectors of the same class as that of   y t g , i . e . , y ^ t g s r = WD ^ y t g s r ( y t g ) ; P f y t g c ( y t g ) + s r
y ^ t g t g R p s r transformation of   y t g   to source domain and filtering through the autoencoder that represents the target domain feature vectors of the same class as that of   y t g , i . e . , y ^ t g t g = WD ^ y t g s r ( y t g ) ; P f y t g c ( y t g ) t g
f y ^ t g s r y ^ t g t g : R p s r R p s r mapping from source domain feature vector   y ^ t g s r   to target domain feature vector   y ^ t g t g , i . e . , y ^ t g t g = f y ^ t g s r y ^ t g t g y ^ t g s r
Table 3. Results of experiments on MNIST dataset for evaluating privacy leakage, interpretability, and transferability.
Table 3. Results of experiments on MNIST dataset for evaluating privacy leakage, interpretability, and transferability.
MethodPrivacy
Leakage
Interpretability
Measure
Transferability
Measure
Classification
Accuracy
minimum privacy leakage transfer and multi-task learning −50.72−2.14−664.520.9510
minimum privacy leakage source domain classifier −50.72−2.14−664.520.1760
maximum interpretability measure transfer and multi-task learning 362.835.44451.930.9920
maximum interpretability measure source domain classifier 362.835.44451.930.9950
maximum transferability measure transfer and multi-task learning 362.835.44451.930.9920
maximum transferability measure source domain classifier 362.835.44451.930.9950
Table 4. Accuracy (in %, averaged over 20 experiments) obtained in amazoncaltech256 semi-supervised transfer learning experiments. The first and second best performances have been marked.
Table 4. Accuracy (in %, averaged over 20 experiments) obtained in amazoncaltech256 semi-supervised transfer learning experiments. The first and second best performances have been marked.
MethodFeature TypeAccuracy (%)
privacy-preserving maximum interpretability-measure modelVGG-FC682.6
privacy-preserving maximum transferability-measure modelVGG-FC682.6
non-private ILS (1-NN)VGG-FC683.3
non-private CDLSVGG-FC678.1
non-private MMDTVGG-FC678.7
non-private HFAVGG-FC675.5
non-private OBTLSURF41.5
non-private ILS (1-NN)SURF43.6
non-private CDLSSURF35.3
non-private MMDTSURF36.4
non-private HFASURF31.0
Table 5. Accuracy (in %, averaged over 20 experiments) obtained in amazondslr semi-supervised transfer learning experiments. The first and second best performances have been marked.
Table 5. Accuracy (in %, averaged over 20 experiments) obtained in amazondslr semi-supervised transfer learning experiments. The first and second best performances have been marked.
MethodFeature TypeAccuracy (%)
privacy-preserving maximum interpretability-measure modelVGG-FC688.5
privacy-preserving maximum transferability-measure modelVGG-FC688.7
non-private ILS (1-NN)VGG-FC687.7
non-private CDLSVGG-FC686.9
non-private MMDTVGG-FC677.1
non-private HFAVGG-FC687.1
non-private OBTLSURF60.2
non-private ILS (1-NN)SURF49.8
non-private CDLSSURF60.4
non-private MMDTSURF56.7
non-private HFASURF55.1
Table 6. Accuracy (in %, averaged over 20 experiments) obtained in amazonwebcam semi-supervised transfer learning experiments. The first and second best performances have been marked.
Table 6. Accuracy (in %, averaged over 20 experiments) obtained in amazonwebcam semi-supervised transfer learning experiments. The first and second best performances have been marked.
MethodFeature TypeAccuracy (%)
privacy-preserving maximum interpretability-measure modelVGG-FC689.3
privacy-preserving maximum transferability-measure modelVGG-FC689.3
non-private ILS (1-NN)VGG-FC690.7
non-private CDLSVGG-FC691.2
non-private MMDTVGG-FC682.5
non-private HFAVGG-FC687.9
non-private OBTLSURF72.4
non-private ILS (1-NN)SURF59.7
non-private CDLSSURF68.7
non-private MMDTSURF64.6
non-private HFASURF57.4
Table 7. Accuracy (in %, averaged over 20 experiments) obtained in caltech256amazon semi-supervised transfer learning experiments. The first and second best performances have been marked.
Table 7. Accuracy (in %, averaged over 20 experiments) obtained in caltech256amazon semi-supervised transfer learning experiments. The first and second best performances have been marked.
MethodFeature TypeAccuracy (%)
privacy-preserving maximum interpretability-measure modelVGG-FC692.6
privacy-preserving maximum transferability-measure modelVGG-FC692.6
non-private ILS (1-NN)VGG-FC689.7
non-private CDLSVGG-FC688.0
non-private MMDTVGG-FC685.9
non-private HFAVGG-FC686.2
non-private OBTLSURF54.8
non-private ILS (1-NN)SURF55.1
non-private CDLSSURF50.9
non-private MMDTSURF49.4
non-private HFASURF43.8
Table 8. Accuracy (in %, averaged over 20 experiments) obtained in caltech256dslr semi-supervised transfer learning experiments. The first and second best performances have been marked.
Table 8. Accuracy (in %, averaged over 20 experiments) obtained in caltech256dslr semi-supervised transfer learning experiments. The first and second best performances have been marked.
MethodFeature TypeAccuracy (%)
privacy-preserving maximum interpretability-measure modelVGG-FC689.1
privacy-preserving maximum transferability-measure modelVGG-FC689.1
non-private ILS (1-NN)VGG-FC686.9
non-private CDLSVGG-FC686.3
non-private MMDTVGG-FC677.9
non-private HFAVGG-FC687.0
non-private OBTLSURF61.5
non-private ILS (1-NN)SURF56.2
non-private CDLSSURF59.8
non-private MMDTSURF56.5
non-private HFASURF55.6
Table 9. Accuracy (in %, averaged over 20 experiments) obtained in caltech256webcam semi-supervised transfer learning experiments. The first and second best performances have been marked.
Table 9. Accuracy (in %, averaged over 20 experiments) obtained in caltech256webcam semi-supervised transfer learning experiments. The first and second best performances have been marked.
MethodFeature TypeAccuracy (%)
privacy-preserving maximum interpretability-measure modelVGG-FC687.8
privacy-preserving maximum transferability-measure modelVGG-FC687.7
non-private ILS (1-NN)VGG-FC691.4
non-private CDLSVGG-FC689.7
non-private MMDTVGG-FC682.8
non-private HFAVGG-FC686.0
non-private OBTLSURF71.1
non-private ILS (1-NN)SURF62.9
non-private CDLSSURF66.3
non-private MMDTSURF63.8
non-private HFASURF58.1
Table 10. Accuracy (in %, averaged over 20 experiments) obtained in dslramazon semi-supervised transfer learning experiments. The first and second best performances have been marked.
Table 10. Accuracy (in %, averaged over 20 experiments) obtained in dslramazon semi-supervised transfer learning experiments. The first and second best performances have been marked.
MethodFeature TypeAccuracy (%)
privacy-preserving maximum interpretability-measure modelVGG-FC691.9
privacy-preserving maximum transferability-measure modelVGG-FC691.9
non-private ILS (1-NN)VGG-FC688.7
non-private CDLSVGG-FC688.1
non-private MMDTVGG-FC683.6
non-private HFAVGG-FC685.9
non-private OBTLSURF54.4
non-private ILS (1-NN)SURF55.0
non-private CDLSSURF50.7
non-private MMDTSURF46.9
non-private HFASURF42.9
Table 11. Accuracy (in %, averaged over 20 experiments) obtained in dslrcaltech256 semi-supervised transfer learning experiments. The first and second best performances have been marked.
Table 11. Accuracy (in %, averaged over 20 experiments) obtained in dslrcaltech256 semi-supervised transfer learning experiments. The first and second best performances have been marked.
MethodFeature TypeAccuracy (%)
privacy-preserving maximum interpretability-measure modelVGG-FC682.9
privacy-preserving maximum transferability-measure modelVGG-FC682.9
non-private ILS (1-NN)VGG-FC681.4
non-private CDLSVGG-FC677.9
non-private MMDTVGG-FC671.8
non-private HFAVGG-FC674.8
non-private OBTLSURF40.3
non-private ILS (1-NN)SURF41.0
non-private CDLSSURF34.9
non-private MMDTSURF34.1
non-private HFASURF30.9
Table 12. Accuracy (in %, averaged over 20 experiments) obtained in dslrwebcam semi-supervised transfer learning experiments. The first and second best performances have been marked.
Table 12. Accuracy (in %, averaged over 20 experiments) obtained in dslrwebcam semi-supervised transfer learning experiments. The first and second best performances have been marked.
MethodFeature TypeAccuracy (%)
privacy-preserving maximum interpretability-measure modelVGG-FC688.9
privacy-preserving maximum transferability-measure modelVGG-FC689.0
non-private ILS (1-NN)VGG-FC695.5
non-private CDLSVGG-FC690.7
non-private MMDTVGG-FC686.1
non-private HFAVGG-FC686.9
non-private OBTLSURF83.2
non-private ILS (1-NN)SURF80.1
non-private CDLSSURF68.5
non-private MMDTSURF74.1
non-private HFASURF60.5
Table 13. Accuracy (in %, averaged over 20 experiments) obtained in webcamamazon semi-supervised transfer learning experiments. The first and second best performances have been marked.
Table 13. Accuracy (in %, averaged over 20 experiments) obtained in webcamamazon semi-supervised transfer learning experiments. The first and second best performances have been marked.
MethodFeature TypeAccuracy (%)
privacy-preserving maximum interpretability-measure modelVGG-FC692.3
privacy-preserving maximum transferability-measure modelVGG-FC692.3
non-private ILS (1-NN)VGG-FC688.8
non-private CDLSVGG-FC687.4
non-private MMDTVGG-FC684.7
non-private HFAVGG-FC685.1
non-private OBTLSURF55.0
non-private ILS (1-NN)SURF54.3
non-private CDLSSURF51.8
non-private MMDTSURF47.7
non-private HFASURF56.5
Table 14. Accuracy (in %, averaged over 20 experiments) obtained in webcamcaltech256 semi-supervised transfer learning experiments. The first and second best performances have been marked.
Table 14. Accuracy (in %, averaged over 20 experiments) obtained in webcamcaltech256 semi-supervised transfer learning experiments. The first and second best performances have been marked.
MethodFeature TypeAccuracy (%)
privacy-preserving maximum interpretability-measure modelVGG-FC681.4
privacy-preserving maximum transferability-measure modelVGG-FC681.4
non-private ILS (1-NN)VGG-FC682.8
non-private CDLSVGG-FC678.2
non-private MMDTVGG-FC673.6
non-private HFAVGG-FC674.4
non-private OBTLSURF37.4
non-private ILS (1-NN)SURF38.6
non-private CDLSSURF33.5
non-private MMDTSURF32.2
non-private HFASURF29.0
Table 15. Accuracy (in %, averaged over 20 experiments) obtained in webcamdslr semi-supervised transfer learning experiments. The first and second best performances have been marked.
Table 15. Accuracy (in %, averaged over 20 experiments) obtained in webcamdslr semi-supervised transfer learning experiments. The first and second best performances have been marked.
MethodFeature TypeAccuracy (%)
privacy-preserving maximum interpretability-measure modelVGG-FC690.8
privacy-preserving maximum transferability-measure modelVGG-FC690.2
non-private ILS (1-NN)VGG-FC694.5
non-private CDLSVGG-FC688.5
non-private MMDTVGG-FC685.1
non-private HFAVGG-FC687.3
non-private OBTLSURF75.0
non-private ILS (1-NN)SURF70.8
non-private CDLSSURF60.7
non-private MMDTSURF67.0
non-private HFASURF56.5
Table 16. Comparison of the methods on “Office+Caltech256” dataset.
Table 16. Comparison of the methods on “Office+Caltech256” dataset.
MethodNumber of Experiments
in Which Method
Performed Best
privacy-preserving maximum transferability-measure model6
privacy-preserving maximum interpretability-measure model5
non-private ILS (1-NN)5
non-private CDLS1
Table 17. Results (median values) obtained in stress detection experiments on a dataset consisting of heart rate interval measurements.
Table 17. Results (median values) obtained in stress detection experiments on a dataset consisting of heart rate interval measurements.
MethodPrivacy
Leakage
Interpretability
Measure
Transferability
Measure
Classification
Accuracy
minimum privacy leakage transfer and multi-task learning −3.743.47291.840.9647
minimum privacy leakage source domain classifier −3.743.47291.840.3411
maximum interpretability measure transfer and multi-task learning 0.4323.92773.360.9619
maximum interpretability measure source domain classifier 0.4323.92773.360.3602
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kumar, M.; Moser, B.A.; Fischer, L.; Freudenthaler, B. An Information Theoretic Approach to Privacy-Preserving Interpretable and Transferable Learning. Algorithms 2023, 16, 450. https://doi.org/10.3390/a16090450

AMA Style

Kumar M, Moser BA, Fischer L, Freudenthaler B. An Information Theoretic Approach to Privacy-Preserving Interpretable and Transferable Learning. Algorithms. 2023; 16(9):450. https://doi.org/10.3390/a16090450

Chicago/Turabian Style

Kumar, Mohit, Bernhard A. Moser, Lukas Fischer, and Bernhard Freudenthaler. 2023. "An Information Theoretic Approach to Privacy-Preserving Interpretable and Transferable Learning" Algorithms 16, no. 9: 450. https://doi.org/10.3390/a16090450

APA Style

Kumar, M., Moser, B. A., Fischer, L., & Freudenthaler, B. (2023). An Information Theoretic Approach to Privacy-Preserving Interpretable and Transferable Learning. Algorithms, 16(9), 450. https://doi.org/10.3390/a16090450

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop