1. Introduction
In the last decade, machine learning (ML) algorithms have demonstrated remarkable results in potentially improving the performance and the efficiency of healthcare applications. A recent study [
1] provides an overview of the benefits that machine learning brings in healthcare, including aiding doctors in their decision making, and decreasing the cost and time it takes to reach a diagnosis. Even though such solutions allow for better resource allocation and treatment selection, they are challenging to implement in real-world circumstances due to several obstacles. The same study emphasizes that one of the most significant problems is the massive amount of high-quality data that are frequently necessary to create and evaluate machine learning models.
A related issue is the ethical aspect of data collection, which necessitates data sourcing for ML, to comply with personal information protection and privacy regulations [
2]. The GDPR establishes precise permission standards for data uses in Europe, whereas the HIPAA regulates healthcare data from patient records in the United States. These laws are considerably more challenging to fulfill when clinical users prefer to delegate ML model development and deployment to third parties, and use them via cloud services, e.g., due to a lack of hardware capabilities. According to a recent survey [
3], the Machine Learning as a Service (MLaaS) paradigm has appeared as a highly scalable approach for remotely running predictive models, raising at the same time increased security and privacy concerns. The same paper highlights that fully homomorphic encryption (HE) could be a straightforward approach that allows a third party to process encrypted data without knowing its content.
An early effort that combined HE with neural networks, involving the communication between the model owner and the data provider, is described in [
4]. CryptoNets [
5] eliminates this interaction, but it has the drawback that the encryption technique does not process real numbers. CryptoDL [
6] approximates nonlinear functions with low-degree polynomials to overcome model complexity restrictions. However, the use of estimated activation functions reduces the prediction accuracy of the model. More recent studies propose different approaches to increase the classification accuracy at the inference phase in AI-based models employing homomorphic encryption. In [
7], adopting a polynomial approximation of Google’s Swish activation function, and applying batch normalization, enhanced classification performance on the MNIST and CIFAR-10 datasets. Additional optimizations are performed to reduce the consumption level. J.W. Lee et al. [
8] emphasize that the most common activation functions are non-arithmetic functions (ReLU, sigmoid, leaky ReLU), which are not suited for homomorphic computing, because most HE schemes only enable addition and multiplication. They evaluate these non-arithmetic functions with adequate precision using approximation methods. In combination with multiple methods for reducing rescaling and relinearization errors, the bootstrapping strategy enables a deep learning model to be evaluated on encrypted data. According to the numerical verification, the ResNet-20 model produced equivalent results on the CIFAR-10 dataset for both encrypted and unencrypted data. The efficiency of MLaaS is drastically improved in [
9], where GPUs acceleration is used to evaluate a pre-trained CNN on encrypted images from MNIST and CIFAR-10 datasets. None of the above-mentioned methods addresses the training phase of models on encrypted data due to the increased number of operations and the longer runtime, this being regarded as an open problem, especially in the case of image-based datasets. For privacy-preserving computations within deep learning models, we suggested a variant of a noise-free matrix-based homomorphic encryption method (MORE [
10]) in our earlier work [
11]. We validated the methodology using two medical data collections in addition to the MNIST dataset. The encryption step is employed during both training and inference. The experiments showed that the method provides comparable results to those obtained by unencrypted models, while having a low computational overhead. However, the changes made to the original HE scheme to allow computations on rational numbers come at a cost in terms of privacy, as it provides lower security than standard schemes. This method was further used in [
12] to design a cloud-based platform for deploying ML algorithms for wearable sensor data, focused on data privacy. We have further addressed the security compromise in [
13], where we combined a HE scheme based on modulo operations over integers [
14], an encoding scheme that enables computations on rational numbers, and a numerical optimization strategy that facilitates training with a fixed number of operations. Nevertheless, the computational overhead introduced through encoding and encryption represents a significant drawback of the method.
The comprehensive survey [
3] includes theoretical concepts, state-of-the-art capabilities, limits, and possible applications for more privacy-preserving machine learning (PPML) solutions based on HE. An overview of techniques based on other privacy-preserving primitives such as multi-party computation (MPC), differential privacy (DP) and federated learning (FL) is provided in [
15]. The authors underline that a hybrid PPML system could feasibly imply a trade-off between ML performance and computational overhead.
Another privacy-preserving approach that has received increasing interest is image obfuscation. In the context of PPML, it entails modifying the image so that the content becomes unintelligible while retaining the underlying information to some extent. Obfuscation methods such as mosaicing, blurring and P3 are analyzed in [
16]. Mosaicing is used to alter parts of an image inside a window whose size is inversely related to obfuscated image resolution. Blurring applies a Gaussian filter that removes details from images. Despite the fact that mosaicing and blurring make it impossible for the human eye to detect faces or digits in obfuscated images, the authors show that standard image recognition models can extract useful information from the transformed data. The strategy suggested in [
17] uses Gaussian noise to obscure only a few images in the dataset (which are considered to have a sensitive content). The authors emphasize that this method could affect the model performance if too many frames require protection.
The obfuscation techniques described in [
18] are variations on the mixup approach, which entails creating convex combinations of pairs of samples. The proposed approaches aim to improve the privacy of the training data, while optimizing the model accuracy without increasing the computational cost of the training process. The presented methods are variants of the mixup technique, which entails creating convex combinations of pairs of samples. After mixing, the newly created sample is further obfuscated through pixel grafting, pixel shuffling, noise addition or blurring. In the same research, authors demonstrate that metrics like SSIM (structural similarity index measure) and HaarPSI (Haar wavelet-based perceptual similarity index), which accord with human perception on picture degradation, may be used for privacy assessment. Two datasets that contain images depicting animals were used to validate the methods. The results highlight that a compromise between obfuscation and learning capabilities must always be considered. The Google Vision AI image classifier was queried with obfuscated images, and its recognition performance was lower than that of the human evaluators. Kim et al. [
19] performed an interesting study focused on privacy-preservation for medical image analysis. They proposed a client-server system in which the client protects the patient identity by deforming the input image using a system comprising a transformation generator, a segmentation network, and a discriminator. The system is trained in an end-to-end adversarial manner to solve the task of MRI brain segmentation. Being focused on enabling protection against facial recognition, the approaches presented in [
20,
21] leverage generative adversarial networks to produce more visually pleasing outputs, while providing a solid defense against deep learning-based recognition systems. In [
21], for the analyzed scenarios, the trade-off is formulated based on the privacy against face recognition versus the utility in terms of face detection.
Herein, we propose an obfuscation technique that combines variational autoencoders with non-bijective functions. The aim is to achieve a method that enables accurate model training, while ensuring privacy against human eye perception and AI-based reconstruction attacks. The experiments are constructed to reflect the perspective of a clinical user (e.g., hospital) in a specific use case (coronary angiography view classification), and the perspective of a threat actor. Because the hospital lacks the physical resources and the expertise to develop a DL classification model, the inference is performed by a third party, which is considered untrustworthy. In this scenario, this external party is a Machine Learning as a Service (MLaaS) provider who can train a DL model using the clinical data, and then make it available as a cloud service for inference. Since the patient data is considered to be sensitive and private, every angiographic frame used for training or inference is obfuscated to protect data privacy outside of the clinical environment. Conversely, a potential threat actor, that could be the MLaaS provider or an interceptor, may try to acquire illegal access to the clinical data. The considered attack strategy is based on the training of a reconstruction model on original-obfuscated pairs of samples from a public dataset. Because the obfuscation method is considered publicly available as a black-box tool for collaborative purposes, any external entity can use the tool to obfuscate images and obtain a dataset of corresponding image pairs. Two possible attack configurations are formulated. In the first one, the threat actor knows the data source (i.e., hospital) but is unaware of its specific type (coronary angiography, in our case), hence the training is performed on a public dataset containing medical-related samples. Another possibility is that the attacker is a collaborative hospital which knows that the target dataset consists of coronary angiographies, and which trains the reconstruction model on its own angiographic data.
All parties other than the hospital are regarded as untrustworthy in terms of data security, and, in consequence, every externalized angiographic frame is, in fact, an obfuscated image. Even the rightful receiver, in this case the MLaaS provider, is not considered honest regarding data confidentiality, which is why the proposed obfuscation method aims to be irreversible. The goal is to protect the medical images from a highly resourceful entity (in terms of both computer power and data), while allowing for the training of the desired deep learning model directly on the altered images.
The remainder of the paper is organized as follows. The obfuscation techniques, as well as the network architectures, datasets, and procedures for the suggested use case, are presented in
Section 2.
Section 3 describes the experiments performed from the perspectives of the clinical user and the threat actor, along with the findings. In
Section 4, we iterate through the unique characteristics of the proposed technique, present remarks regarding its usefulness in deep learning-based applications, and finally draw the conclusions.
2. Methods and Materials
In the following, we propose a novel strategy that combines two obfuscation approaches to:
Hide the content of a sensitive image from the human eye;
Make AI-based image reconstruction challenging;
Facilitate DL model training using obfuscated images.
The first stage is to train a variational autoencoder, which uses the original (non-obfuscated) dataset as both input and target, and provides an obfuscated counterpart for each sample at the bottleneck. A detailed description of the VAE architecture, training and obfuscation process is presented in
Section 2.1. The next step is also described as a stand-alone method in
Section 2.2, where every pixel intensity value is randomly translated to another intensity value in a non-bijective manner, to alter the visual information. When the techniques are used in conjunction, the image encoded with the VAE is further obfuscated through pixel substitution, according to a non-bijective mapping function. The entire workflow is detailed in
Section 2.3. The clinical usage scenario, the dataset, and the architecture used to solve the classification task are presented in
Section 2.4.
Section 2.5 describes the procedures employed to evaluate the privacy level provided by the proposed approach against human perception and against AI-based reconstruction attacks.
2.1. Obfuscation Method Based on a Variational Autoencoder
The Variational Autoencoder [
22] considered herein is a generative model based on the work of Kingman et al. [
23]. It consists of two models that support each other: an encoder (recognition model) and a decoder (generative model). The difference between VAEs and other AEs is that the input is not encoded as a single point, but as a distribution over the latent space, from which the decoder draws random samples. Due to the reparameterization trick, which allows for backpropagation through the layers, the two components of the VAE can be chosen to be (deep) neural networks.
The autoencoders, and by extension VAEs, generate an encoding of the inputs that allow for an accurate reconstruction. This property also ensures that the encoding contains useful information extracted from the input, and, hence, it can be employed in further DL-based analysis or model training, e.g., within an obfuscation method based on VAE.
From a probabilistic perspective, a VAE implies approximate inference in a latent Gaussian model, where the model likelihood and the approximate posterior are parameterized by neural networks. The recognition model compresses the input data x into a dimensionally reduced latent space , while the generative model reconstructs the data given the hidden representation . Let us denote the encoder and the decoder , where and represent the neural network parameters.
The latent variables
are considered to be drawn from a simple distribution:
, named prior (here,
I denotes the identity matrix). The input data
x have a likelihood
that is conditioned on
z. As a result, a joint probability distribution over data and latent variables can be defined:
The aim is to calculate the posterior distribution
. This can be achieved by applying Bayes’ rule:
where
can be obtained by marginalizing out
z:
. Unfortunately, the integral is usually intractable [
24]. As a consequence, an approximation of this posterior distribution is required.
There are two main ways for posterior approximation: applying Markov Chain Monte Carlo (MCMC) methods such as the Metropolis–Hastings algorithm [
25] or Gibbs sampling [
26], and variational inference (VI) [
27]. VAE uses the latter because the sampling methods converge slower [
28]. This approach implies approximating the posterior with a family of Gaussian distributions
, where parameters
represent the mean and the variance of each hidden representation. As a result, the encoder parameterizes the approximate posterior
, taking
x as input data, and parameters
as outputs. On the other hand, the decoder parameterizes the likelihood
, having the latent variables as input and the parameters to distribution
as output. The approximation is penalized by computing the Kullback–Leibler (KL) divergence that measures the distance between
and
.
Hereupon, the loss function which is minimized during training is composed of two terms: (i) the reconstruction error between input data
x and output data
, and (ii) the KL divergence between the approximate posterior and
, chosen to be a normal distribution:
The first step of our method is to train a convolutional VAE on another dataset from the same domain as the working dataset. Additionally, one of the layers is used for noise addition. At the bottleneck, the information is divided between two channels to obtain an encoded version of the input. Those channels correspond to the mean (channel 1) and standard deviation (channel 2) of the normal distribution obtained from the encoder. Any of the channels can then be used for a subsequent DL model training on obfuscated images. From the trained VAE, only the encoder is retained as a black-box obfuscation tool. As there is no need for a reconstruction once an image has been obfuscated, the decoder is discarded.
Figure 1 displays the workflow described for the obfuscation method based on a VAE.
For our experiments, the VAE is trained on the Medical MNIST dataset [
29]. The dataset contains 6 classes of X-ray images, that are randomly distributed for training (30,000 images) and validation (12,000). More details about the Medical MNIST dataset are presented in
Section 2.5.
During training, the 64 × 64 images are passed through three convolutional layers of 32, 8, and 4 filters, respectively, with a 3 × 3 receptive field. ReLU is the activation function chosen for each layer. The architecture of the decoder consists of three convolutional, ReLU activated layers of 4, 8, and 32 filters, followed by one dense layer. The VAE is trained for 10 epochs.
The trained encoder can be used for obfuscating medical images. A channel option must be selected, depending on the desired result. The first channel, corresponding to the mean of the normal distribution, usually assures a better privacy level than the second channel, as it does not preserve as much detailed information from the input. This limits, though, its usefulness in further AI-algorithms. The channel corresponding to the standard deviation of the normal distribution tends to preserve more useful information from the original image. As a result, it is preferred in cases where the obfuscated images would be used in machine learning tasks. This channel, although depending on the initial structure of the original image, may or may not ensure the imposed or desired level of privacy. For example, in the encoding of an image with a monochromatic background, most probably sensitive details will be visible, which could uncover the nature of the original image. Such an example is shown in
Figure 2, where the original image, representing a coronary angiography, has an almost monochromatic background. As a result, in the image obtained from channel 2, the main vessel can be seen.
2.2. Obfuscation Based on Non-Bijective Pixel Intensity Shuffling
This approach starts with a simple obfuscation technique—random pixel intensity shuffling. Every pixel intensity is randomly associated with another value from the same interval as described by Equation (
4), where
is a function that returns all integer numbers between
a and
b, including the interval’s endpoints, and
is a function that randomly interchanges the positions of the elements of a list
x inside the returned array. We call this array a map because it creates connections between each possible pixel intensity embodied in the list of indexes of the array and a new random value contained in the array at the corresponding position.
This association is a bijective function because for each domain component there is only one corresponding element in the codomain. Although this operation preserves the underlying information of the images, while making them unrecognizable for the human eye, the approach is still susceptible to AI-based attacks, statistical or even reverse engineering attacks. Presuming that an external party has access to the obfuscation algorithm in a black-box form, an unlimited number of new images can be obfuscated, and a statistical evaluation should reveal that a one-to-one mapping was used. By reversing this mapping, a potential attacker can obtain the original images with no information loss. Training a deep learning model to reconstruct the obfuscated images is another attack approach. In anticipation of this kind of attack, a second step is proposed for this obfuscation method. The bijective function is modified so that the injectivity property is lost. In other words, multiple elements of the domain will correspond to the same element of the codomain. This effect is achieved by applying the same
operation on each value of the previously obtained map. Hence, the obfuscation method can be defined by a function
, where
and
. When obfuscating an image, an iteration across all pixels must be performed. In Equation (
5),
denotes the intensity of the pixel found at the
coordinates in the
matrix.
This value is modified according to Equation (
6), where the
function represents the typical modulo operation and the
value is used as an index.
Figure 3 synthesizes the steps proposed for this obfuscation technique. The key concept is that applying a
operation limits the range of possible values to
N elements. However, this is not equivalent to filtering the highest intensities due to the previously performed random associations. Thus, more details are preserved in images by arbitrary but consistent replacement of
pixel intensities. Since the obfuscation function is represented by a many-to-one mapping, the task of reconstructing unseen images becomes more complex and more uncertain, even for an AI-based model trained on original-obfuscated image pairs.
The
N value is an adjustable parameter that improves security when being set to lower values. As a function of this parameter, the underlying information is preserved in different degrees, presumably retaining enough details in the images for DL-based applications.
Figure 4 displays a comparison between an angiographic frame
Figure 4a and the obfuscated counterparts when bijective
Figure 4b or non-bijective
Figure 4c–e mapping is applied. The obfuscated samples are rescaled in
interval to allow a better visual comparison.
2.3. Secure Obfuscation Algorithm
As previously explained, the security of VAE obfuscation also depends on the image itself. For images with a uniform distribution of pixel intensities, the method will not only protect the content from the human eye perception but, due to the additional noise, also make it more difficult for an AI-based model to reconstruct the original image. In contrast, the human eye would be able to discern the environment from the main structures, or even details of the structures, in a dichromatic image where two predominant intensities describe the object and the background. The noise level can vary, but this would also affect the utility of the image. Using a non-bijective function to substitute the intensities makes the obfuscated images unrecognizable by the human eye. Although the modulo operation is meant to protect against more sophisticated attacks, the success rate of an AI-based reconstruction attack depends on the value of N. The smaller this parameter is, the more difficult the reconstruction becomes. However, this implies a trade-off between privacy and utility. We integrate the strengths of each method into a new obfuscation algorithm to maximize their effectiveness. The steps are as follows, in the order in which they should be performed:
The VAE model is trained on images similar to those that will be obfuscated in the clinical use case.
All pixel intensities are randomly shuffled, and a operation is performed on each resulting value leading to a non-bijective mapping between different intensities.
The original image is encoded using the VAE encoder.
Each pixel value of the encoded image is substituted with the corresponding value in the non-bijective map.
As a result, an obfuscated image is created, which retains the original image’s underlying relevant information and can be used for further analysis and processing (e.g., image classification). Regardless of the initial structure of an image, combining the techniques improves privacy. First, the eye perception is affected by the intensity shuffling even if, after encoding, the sensitive content is still distinguishable. Then, the protection against AI-based reconstruction attacks is ensured by the conjunction of noise and non-bijectivity. The entire obfuscation workflow is schematically depicted in
Figure 5.
Although the underlying information of an image is preserved using this technique, an essential requirement that must be met to use multiple images in the same application (e.g., training a classifier on obfuscated images) is that the same encoder and the same shuffling map should be applied on all images (both for training and inference). The trade-off between privacy and utility can be managed by tuning certain method-specific parameters according to the needs of the use case. For the technique based on non-bijective intensity mapping, the choice of parameter
N may influence the image utility. Regarding confidentiality, a higher
N implies less information retained in the obfuscated image and, thus, a more difficult to perform image reconstruction.
Figure 6 displays an original angiographic frame and the obfuscated counterparts for each obfuscation approach. The chosen value for the modulo operator
N in
Figure 6c is 96. More examples are included in
Appendix A,
Figure A1.
2.4. Utility Level Evaluation
As the methods described above rely on reducing, to a certain degree, the information from the original images, their utility after the obfuscation must be evaluated. To perform this analysis, the same DL model is trained for multiple levels of obfuscation, including no obfuscation. The methods presented in
Section 2.1 and
Section 2.2 are employed separately and in conjunction, as described in
Section 2.3, to obfuscate an in-house dataset consisting of coronary angiography frames. The same experiment is run for multiple values of
N, ranging between 1 and 255. The utility of obfuscated images is determined by comparing the accuracy achieved on a testing dataset for different degrees of obfuscation.
The task is to train a binary classifier to distinguish between RCA and LCA views in angiographic frames.
Figure 7 depicts one sample of each category. The dataset contains 3280 coronary angiographies, balanced between the two classes. A subset of 600 images is used for validation, and another subset of 700 images is retained for evaluation purposes. The rest of the 1980 angiographic frames are used for training. Augmentation techniques such as shifting, flipping, zooming and rotation are applied. The original size of the frames is 512 × 512 pixels, but experiments with different input shapes have shown that a size of 128 × 128 ensures almost no loss in classification performance with a lower computational time. The pixels values are normalized through min-max scaling in the [0, 1] range.
The images (obfuscated or not) are passed through four convolutional layers of 16 and 32 filters with a 3 × 3 receptive field during training. The pooling layers downsample the images by a factor of two by using the maximum value of a window. After the last convolutional layer, a flatting layer is added to convert the features matrix into a vector. The fully connected layers contain 512, 1024 and 2 nodes, respectively. The ReLU function is employed as an activation function for all layers, except for the last one where the softmax activation is used. Each convolutional layer is followed by a local normalization layer [
30] to make the model more robust to image degradation. To limit the overfitting, between 25% and 50% of the connections of the neurons are dropped through dropout layers. Furthermore, although the maximum number of epochs is set to 30, early stopping is employed when the validation loss is not decreasing within 10 consecutive epochs. A learning rate scheduler is used to achieve good convergence, starting from 1 × 10
, and diminishing the value with every epoch. The workflow of an inference step using the obfuscation algorithm is depicted in
Figure 8.
The Keras framework [
31] was used to build the convolutional neural network, and the local normalization layer is based on [
30]. The experiments were run on a computer equipped with an Intel i7 CPU (Intel, Santa Clara, CA, USA) at 4.2 GHz, 32 GB RAM and an NVIDIA GeForce GTX 1050 Ti GPU (Nvidia, Santa Clara, CA, USA) with 4 GB of dedicated memory.
2.5. Privacy Level Evaluation
To compare the degree of privacy provided by each proposed technique, we employ similarity metrics such as SSIM and PSNR (peak signal-to-noise-ratio) assessed between the original and the corresponding obfuscated images. As stated in [
18], SSIM is an image quality metric that can quantify image privacy. It considers perceptual phenomena like brightness and contrast, as well as structural information changes. SSIM can take values between 0 and 1, where 0 means no structural similarity, and 1 indicates identical images. Therefore, lower values correspond to an increased security. PSNR is expressed using the decibel scale, and typical values for good quality images (with a bit depth of 8) are between 30 and 50 dB. As a result, values below the lower threshold indicate that the image is protected against human perception. The entire testing subset owned by the hypothetical clinical user is employed for this evaluation. The averaged results are presented in
Section 3.3.
Two possible attack configurations are considered to assess the level of security against AI-based reconstruction. The considered scenario is that of an external party willing to access the original data sent by the hospital or by a specific patient. The general assumption is that the obfuscation algorithm used by the hospital is publicly available as a black-box tool. The privacy parameter N is also presumed to be known. This means that another clinical user or an MLaaS provider, or even an external interceptor can use the tool to obfuscate images and obtain a dataset of corresponding image pairs. Moreover, because the data source is known, the threat actor might guess that the dataset consists of medical images. The workflow of an entity willing to gain unauthorized access to the data has the following steps: obfuscating a dataset of medical images using the same obfuscation tool as the hospital, training a deep learning model to reconstruct the original frames from the obfuscated images, intercepting obfuscated images, and reconstructing the original images using the previously trained model.
In the first attack configuration, the interceptor assumes that the targeted data contains medical images, but is unaware of their type (
); therefore, the malicious actor trains the reconstruction model using a publicly available dataset with different medical-related classes. In the following experiments (see
Section 3.3), the reconstruction model is trained using the Medical MNIST dataset [
29]. It contains six classes of X-ray images (abdomen CT, breast MRI, CXR, chest CT, hand radiography, head CT), each class totalling around 7000 samples. All 40,954 medical images are used for training, and the evaluation is performed on the intercepted obfuscated dataset. The Medical MNIST images have a size of 64 × 64 pixels, but they are resized to 128 × 128, the dimensions of the frames sent by the hospital.
Figure 9 depicts a sample of each category of the Medical MNIST dataset.
Another possibility is that the type of the medical images is well known, so a similar dataset is used to train the reconstruction model (). For example, two clinical partners want to create an aggregated dataset containing coronary angiographies for training a view classification model, but they both wish to keep their data confidential. However, one of the partners is willing to obtain the content provided by the other. As they both use the same obfuscation tool, the threat actor obfuscates his angiographic dataset, and uses it to train a reconstruction model. Then, the malicious actor intercepts the obfuscated frames of the victim, and tries to undo the obfuscation. The (in-house) dataset used in these experiments contains 8365 angiographies (5779 LCA and 2586 RCA), all employed for training. Their original size (512 × 512) is modified to 128 × 128.
Before training, both the inputs (obfuscated images) and the targets (original images) are normalized through min-max scaling in the [0, 1] interval. The U-Net architecture introduced in [
32] is employed for reconstruction. The first half of the network, which behaves like an encoder, consists of convolutional and pooling layers that perform downsampling. Each decoder block combines its input with information from the corresponding encoder block, and performs convolutional and upsampling operations. The same activation function, number of filters, kernel size, pooling window and stride as in the original paper were used. The batch size and momentum values were set to 1 and 0.99, respectively. The model was trained for 30 epochs with a learning rate of 0.001. The architecture was implemented in the PyTorch framework [
33], and the models were trained on a machine equipped with 128 GB RAM and NVIDIA GeForce GTX 1080 Ti GPU with 11 GB of dedicated memory.
The reconstruction network was trained on images obfuscated using the methods described in
Section 2.1 and
Section 2.2, and the algorithm presented in
Section 2.3 for multiple values of the parameter
N. To determine the degree of similarity between the reconstructed images and the original counterparts, SSIM and PSNR are computed across all frames sent by the victim (the training dataset of the classifier). Considering the threshold values of SSIM, in the results presented in
Section 3.3, a lower SSIM value denotes a poor reconstruction performance and a high privacy level. Regarding the interpretation of PSNR, in the following experiments, values under 30 indicate inaccurate reconstruction and high security. The scikit-image library [
34] was employed for computing the similarity metrics.
Expert readers manually performed a visual assessment to determine to what extent the reconstructed images are protected against human perception. The assessment was performed on 50 frames (25 LCA, 25 RCA). Since in most cases the background was reconstructed more accurately than the arteries, two separate scores were assigned for each image. A scale from 1 to 5 was chosen, where 1 indicates that the object was not reconstructed at all and 5 denotes a visual similarity larger than 95%. Some scoring guidelines were formulated to limit the evaluation bias.
Table 1 and
Table 2 synthesize the links between scores and image descriptions.
Figure 10 and
Figure 11 display for each score an evaluation example corresponding to the scoring guidelines. The mean scores are computed for all evaluations of all frames. The LCA and RCA frames were also considered separately to determine if reconstruction performs better on a specific class.