On the Application of a Sparse Data Observers (SDOs) Outlier Detection Algorithm to Mitigate Poisoning Attacks in UltraWideBand (UWB) Line-of-Sight (LOS)/Non-Line-of-Sight (NLOS) Classification

Baldini, Gianmarco

doi:10.3390/fi17020060

Open AccessArticle

On the Application of a Sparse Data Observers (SDOs) Outlier Detection Algorithm to Mitigate Poisoning Attacks in UltraWideBand (UWB) Line-of-Sight (LOS)/Non-Line-of-Sight (NLOS) Classification

by

Gianmarco Baldini

European Commission, Joint Research Centre, 21027 Ispra, Italy

Future Internet 2025, 17(2), 60; https://doi.org/10.3390/fi17020060 (registering DOI)

Submission received: 30 November 2024 / Revised: 22 January 2025 / Accepted: 23 January 2025 / Published: 3 February 2025

Download

Browse Figures

Versions Notes

Abstract

:

The classification of the wireless propagation channel between Line-of-Sight (LOS) or Non-Line-of-Sight (NLOS) is useful in the operation of wireless communication systems. The research community has increasingly investigated the application of machine learning (ML) to LOS/NLOS classification and this paper is part of this trend, but not all the different aspects of ML have been analyzed. In the general ML domain, poisoning and adversarial attacks and the related mitigation techniques are an active area of research. Such attacks aim to hamper the ML classification process by poisoning the data set. Mitigation techniques are designed to counter this threat using different approaches. Poisoning attacks in LOS/NLOS classification have not received significant attention by the wireless communication community and this paper aims to address this gap by proposing the application of a specific mitigation technique based on outlier detection algorithms. The rationale is that poisoned samples can be identified as outliers from legitimate samples. In particular, the study described in this paper proposes a recent outlier detection algorithm, which has low computing complexity: the sparse data observers (SDOs) algorithm. The study proposes a comprehensive analysis of both conventional and novel types of attacks and related mitigation techniques based on outlier detection algorithms for UltraWideBand (UWB) channel classification. The proposed techniques are applied to two data sets: the public eWINE data set with seven different UWB LOS/NLOS different environments and a radar data set with the LOS/NLOS condition. The results show that the SDO algorithm outperforms other outlier detection algorithms for attack detection like the isolation forest (iForest) algorithm and the one-class support vector machine (OCSVM) in most of the scenarios and attacks, and it is quite competitive in the task of increasing the UWB LOS/NLOS classification accuracy through sanitation in comparison to the poisoned model.

Keywords:

security; wireless communication; deep learning; machine learning

1. Introduction

Wireless communication systems are heavily dependent on the propagation channels where the signal is transmitted. Thus, knowledge of the propagation channel can be useful information to improve the quality of transmissions. An essential distinction is between Line of Sight (LOS) and Non-Line-of-Sight (NLOS) propagation scenarios because of their significant impact on the transmitted signal, which can degrade the Quality of Service (QoS). Knowledge of this information can be used to apply mitigation techniques in the signal processing stage to improve various wireless communication functions like physical layer scheduling, network engineering, beam selection, and resource management in wireless communication networks [1,2]. In general, the design of channel classifiers has to achieve a balance in the trade-off between computing time and accuracy in the correct identification of the LOS/NLOS channel status.

In the literature, different approaches have been proposed, from channel identification based on statistical analysis (i.e., using metrics like the K-factor in the Rician fading model [3] and the Root Mean Square (RMS) delay spread or the mean access delay in [4]) to recent developments based on the application of shallow and deep learning algorithms [5,6]. For simplicity, throughout the rest of this paper, shallow machine learning algorithms are indicated with the term ML while deep learning algorithms are indicated with the term DL, even if it is acknowledged that machine learning usually includes deep learning. Even if Deep Learning (DL) has demonstrated a superior classification performance in LOS/NLOS classification problems [7,8], the computing needs are severe, resulting in long computing times for training and prediction. For this reason, this paper focuses only on ML.

As described in the subsequent section, Section 2, a number of studies have applied Machine Learning (ML) algorithms to channel classification in recent times, but a key aspect has not been addressed: as in other ML domains, adversarial attacks aimed to hamper the proper functioning of ML algorithms have been largely ignored. Thus, this paper aims to address this gap by proposing an analysis of poisoning attacks in the application of ML to LOS/NLOS classification using Outlier Detection (OD) algorithms. In particular, this paper addresses the specific case of UltraWideBand (UWB) LOS/NLOS classification. The analysis is based on the assumption that poisoned samples may appear to be outliers in the data set on the basis of the outlier score provided by the OD algorithm (the higher the score, the higher the probability that the sample is an outlier). Then, OD algorithms are used both to identify the poisoned samples via ML and to ‘sanitize’ the model by using only a percentage of the samples that have a lower outlier score.

Poisoning attacks can be of different types with different underlying algorithms and severity. This paper conducts an extensive study on the impact of the type of attack and severity focusing on the so-called data poisoning class of attacks as indicated in [9], which includes the label flipping attack and the samples randomization alteration. In addition, this paper proposes a novel attack based on the OD algorithms themselves. This attack is inspired by the perturbation attack in the literature where the feature values are modified using random noise. In the novel attack proposed in this paper, the attacker also has knowledge of the most likely values of the features for each class. This is a more sophisticated attack than the second attack and it requires that the attacker have knowledge of (or have acquired) a representative (for a specific class) number of samples. In this study, this attack is emulated by choosing the 100 samples returned by the Sparse Data Observers (SDO) algorithm (one of the OD algorithms used in this paper) for each label with the lowest scores (a lower score indicates a more representative sample of a specific class).

Three OD algorithms are used: iForest, OCSVM, and the recently proposed SDO algorithm, which is characterized by its low computing complexity. A summary of the key contributions of this paper in comparison to the existing literature is presented in the subsequent section on related work, Section 2.

The structure of this paper is as follows: Section 2 describes the related work in this area and the advancements of this paper in comparison to the surveyed literature. Section 3 describes the overall methodology of the proposed approach including the OD algorithms used in the study, the threat model, the ML algorithm used for classification, and the evaluation metrics. Section 4 describes the two data sets used to evaluate the approach. Section 5 provides the results of the evaluation of the proposed approach on the UWB real-signals described in Section 4. Finally, Section 6 provides the conclusions of this study and outlines future developments.

2. Related Work

The investigation of poisoning attacks in ML for LOS/NLOS classification is a new field with no studies identified by the author so far. On the other hand, the literature comprises significant and recent studies on the application of ML for LOS/NLOS classification and on poisoning attacks in ML and DL. The aim of this section is to provide an overview of both areas, which does not aim to be exhaustive because of the presence of other extensive surveys on these topics. In particular, this section focuses on the following: (1) the application of ML only (not DL) to LOS/NLOS classification, and (2) the analysis of related work on adversarial ML and poisoning attacks specifically for the wireless communication domain.

From a historical point of view, the problem of channel identification exploited the application of statistical wireless channel models, whose parameters (parameter m in Nakagami models) were estimated using measurement campaigns in real scenarios. Recent studies have applied ML techniques as explained in two recent surveys [10,11]. In many studies, the application of ML is related to the extraction of features from the transmitted signal, which is given as input to the classifier. Some examples of this approach can be found in [1]; standard deviation, skewness, and kurtosis were used in [1,5,12,13], Shannon entropy in [14], and cumulants in [15]. Different ML algorithms have been applied. As described in [10], the most common algorithms are K Nearest Neighbor (KNN), Support Vector Machine (SVM), and Decision Tree–Random Forest (DT-RF). In particular, Decision Tree (DT) and Radio Frequency (RF) were used in [16,17] and the latter is also adopted in this paper. The features can be extracted not only from the time domain but also from the spectral domain or even from the modes resulting from the application of mode decomposition algorithms like Variational Mode Decomposition (VMD) as in [6], even if there is the potential trade-off that the mode decomposition implementation can be time consuming.

A recent survey on adversarial attacks in wireless communication is presented in [18], where it is recognized that the potential security problems of DL in wireless applications have not been fully studied yet. Thus, vulnerabilities in ML and DL may open up opportunities for small-scale adversarial attacks to cause chaos in the model’s performance. A study similar to ours is [19], where poisoning attacks are investigated for the application of DL to signal modulation classification, which is a different problem from LOS/NLOS classification. An OD algorithm was exploited to detect poisoned samples from legitimate samples resulting from the execution of a label flipping attack (as in this study). On the other hand, other attacks investigated in this paper have not been considered. A binary classification was performed as in this study but it is based on a Convolutional Neural Network (CNN) in [19], rather than the Random Forest (RaF) algorithm used here.

Another example is [20], which focused on poisoning attacks in traffic prediction problems, where a perturbation is added to the legitimate samples during the model training phase to emulate a poisoning attack in a traffic classification problem in cellular networks. The proposed attack is similar to one of the attacks considered in this study, but the OD algorithms proposed in this study are not applied (the feature space is augmented on the basis of characteristics of the specific problem). In addition, [20] is specifically applied to DL algorithms while this paper is focused on ML.

Regarding other aspects pertaining to poisoning attacks and ML in the literature, extensive literature surveys are offered in [9,21,22], to which the interested reader can refer. The key points that can be extracted from these surveys for this study are mostly related to the identification of types of attack, including label flipping, perturbation of legitimate samples, and the potential application of OD algorithms to identify poisoned samples and implement sanitation of the poisoned data set. This study is focused on the data poisoning class of attacks as indicated in [9]. Moreover, this paper also proposes a new type of data poisoning based on the OD algorithm itself used in its clustering function as described in more detail in Section 3.2.

To summarize the key advancements of this paper in comparison to the surveyed literature:

This study proposes a novel analysis of poisoning attacks with ML to the problem of LOS/NLOS fading classification, which has not attempted in the literature yet.
In the context of ML applied to wireless communication, this paper proposes the novel application of the (SDO) algorithm, which performs better than OD algorithms commonly used in the literature like iForest and OCSVM in most of the considered attack scenarios.
A new type of poisoning attack is introduced, which is also based on OD algorithms or more specifically on their clustering function.

The limitation of this study is its specificity to ML algorithms and not DL algorithms, which is justified by the lack of analysis in this area and the consideration that computing efficiency is particularly important in LOS/NLOS classification.

3. Methodology

3.1. Workflow

The overall set of procedures composing the proposed approach is shown in Figure 1.

As the proposed approach is based on the application of ML algorithms, the time series (i.e., Channel Impulse Response (CIR)) is transformed in a feature space by applying a feature extraction process using the following features: variance, skewness, kurtosis, Shannon entropy, quantile with a threshold of 0.7, and quantile with a threshold of 0.9. These features are applied to the original time domain representation of the CIR as well as the magnitude component and the phase component of the spectral domain representation using the Fast Fourier Transform (FFT) algorithm for a total of 18 features. These features are used in this approach because they have been applied to the same problem in the literature [6,23,24]. The feature space is normalized between 0 and 1, then it is labeled with LOS and NLOS labels and split into a training and test data set (3/4 of the data set is for training and 1/4 of the data set is for testing). Then, poisoning attacks are applied to the training data set. As described in the threat model given in Section 3.2, three attacks are implemented: the classical label flipping attack [25], the perturbation attack, and perturbation based on a template created by the best 100 samples for each class (LOS and NLOS). The knowledge and tools needed by the attacker to implement the attack are described in detail in Section 3.2. Different poisoned data sets are created for various values of the percentage of training samples

T_{P}

and the severity of the attack

S_{P}

(which is the amount of applied perturbation to the feature values).

S_{P}

is not applicable to the label flipping attack. The OD algorithms are applied to the poisoned feature spaces. Three algorithms are adopted in this study: the titular SDO algorithm [26], the iForest algorithm [27], and the OCSVM algorithm [28]. These generate a list of outlier scores for all the samples of the poisoned feature spaces. The assumption is that the highest scores (which indicate an anomaly) can be used to identify the poisoned samples while the lowest scores can be used to select the legitimate samples, which have not been tampered with. This assumption can be unpaired by the presence of anomalies in the initial data set, which will be selected as poisoned samples but they are actually legitimate (this is one of the limitations of the proposed approach). The highest scores are used as the first evaluation metric to identify the capability of the approach and the OD algorithm to identify the poisoned samples while the lowest scores are used to sanitize the poisoned data set and the classification accuracy of the ML is the second evaluation metric. See Section 3.4 for further details. For the second metric, the results of the ML classification are compared between its application to the poisoned feature space and the sanitized feature space.

3.2. Threat Model

This section describes the threat model and the information the attacker should know to poison the feature space. The assumption of this study is that the attacker does not have access to learning algorithms or modeling procedures, but he/she has the power to access the training data set and modify a percentage of it (identified by the parameter

P_{S}

) in different ways: either changing the label value or the features values [9]. Three different attacks are considered in this study:

The first attack is also called the label flipping attack, where the attacker has knowledge of the label and changes its value to the other class (e.g., LOS to NLOS or vice versa).
The second attack is clean-label poisoning where the attacker cannot change the label but he/she can change the values of the features.
The third attack is a variation of the second attack where the attacker has also knowledge of the most likely value of the feature for each sample.

The first attack, the label flipping attack, is one of the first considered attacks [21,22] in the literature. In this study, a random flipping attack is implemented where the attacker has knowledge of the label and changes its value to the other class (e.g., LOS to NLOS or vice versa). The values of the features remain unchanged. The algorithm describing this attack is shown in Algorithm 1. Even if it is relatively old in the research literature, this attack is still addressed in this study because it is relatively difficult for the unsupervised OD algorithms considered in this paper to mitigate. This attack will be identified with the term LF in the rest of this paper. In the subsequent algorithms, the training data set is defined as

S_{T r a i n} = {(x_{i}, y_{i})}_{i = 1}^{n}

, where

x_{i} \in {[0, 1]}^{d}

are the feature vectors with d features and

y_{i}

are the target values of labels (i.e., 0 or 1 for the binary classification problem considered in this study).

Algorithm 1: Random label flipping algorithm

1:: procedureFL( $S_{T r a i n}, P_{S}$ )
2:: T = random select ( $S_{T r a i n}, P_{S}$ );
3:: m = size(T)
4:: for $i \leftarrow 1, m$ do
5:: y = label( $T_{i}$ )
6:: if $y = = 0$ then
7:: y = 1
8:: else
9:: y = 0
10:: end if
11:: end for
12:: return $S_{P o i s}$
13:: end procedure

The second attack is a clean-label poisoning attack where the attacker cannot change the label but he/she can change the values of the features. This may happen if the information and the label are kept separated from the features. This attack can be implemented with different levels of severity, which is regulated by the hyper-parameter

S_{P}

in this study. In practical terms, a weighted sum is calculated with the initial value of the feature plus a random value between 0 and 1 (note that all the features are normalized between 0 and 1 as described before). The corresponding algorithm description is provided in Algorithm 2, where

x p_{i, j}

is the new value of the poisoned feature that is going to create the new poisoned training set

S_{P o i s}

, i is the index of the sample, which is equal to 1… (

T_{P} \times S_{T r a i n}

), and j is the index of the feature with j = 1…18.

x v_{i, j}

is the value of the original feature,

R_{N D}

is the random function, which returns a value between 0 and 1, and

S_{P}

is the severity of the attack meant as the weight of the perturbation on the feature values.

S_{T r a i n}

is the number of samples in the training set, which is equal to 2250 samples in this study. This attack will be identified with the term FSP in the rest of this paper.

Algorithm 2: Feature Scrambling Poisoning (FSP) algorithm

1:: procedureFSP( $S_{T r a i n}$ , $P_{S}$ , $S_{P}$ )
2:: T = random select ( $S_{T r a i n}, P_{S}$ );
3:: m = size(T)
4:: for $i \leftarrow 1, m$ do
5:: $x v_{i}$ = $S_{T r a i n}$ ( $T_{i}$ )
6:: d = size(xv)
7:: for $j \leftarrow 1, d$ do
8:: $x p_{i, j}$ = (1 − $S_{P}$ ) $\times x v_{i, j}$ + ( $S_{P} \times R_{N D}$ (0,1)));
9:: end for
10:: end for
11:: return $S_{P o i s}$
12:: end procedure

The third attack is a variation of the second attack where the attacker has also knowledge of the most likely value of the feature for each sample. This is a more sophisticated attack than the second attack and it requires that the attacker have knowledge (or the acquisition) of a representative (for each class) number of samples for each sample. In this study, this attack is emulated by choosing the 100 samples returned by the SDO algorithm for each label with the lowest score. These templates are called

L O S T

for the label = 0 and

N L O S T

for the label = 1. Then, a similar attack to the second attack is implemented but the values of the features of the generic sample are replaced with a weighted sum based on the values of

L O S T

and

N L O S T

as described in Algorithm 3.

Algorithm 3: Targeted Feature Scrambling Poisoning (TSP) algorithm

1:: procedureTSP( $S_{T r a i n}, P_{S}, S_{P}$ )
2:: T = random select ( $S_{T r a i n}, P_{S}$ );
3:: m = size(T)
4:: for $i \leftarrow 1, m$ do
5:: y = label( $T_{i}$ )
6:: if $y = = 0$ then
7:: $x_{i}$ = ( $1 - S_{P}$ ) $\times x v_{i, j}$ + ( $S_{P} \times N L O S T$ );
8:: else
9:: $x_{i}$ = ( $1 - S_{P}$ ) $\times x v_{i, j}$ + ( $S_{P} \times L O S T$ );
10:: end if
11:: end for
12:: return $S_{P o i s}$
13:: end procedure

3.3. Outlier Detection Algorithms

Three different OD algorithms are used in this study: the main one is the SDO algorithm, while a comparison is carried out with two other well-known algorithms, the Isolation Forest (IF) and the One Class Support Vector Machine (OCSVM) algorithms. All three OD algorithms are of the unsupervised type because they should not be able to use labeled information, which may be poisoned by the label flipping attack.

SDO is an unsupervised algorithm that was developed to address the need for fast, highly interpretable, and intuitively parameterizable anomaly detection; it was initially introduced in [26]. The design of SDO is based on the concept of generating a low-density data model formed by observers. In SDO, an observer is a data object (i.e., sample) that is located within the data mass and ideally at an equivalent distance to other observers within the same cluster. The outlierness score of the samples in the data set is estimated using the distance to its observers. Like other OD algorithms, SDO is an eager learner that generates a low-density model of the data set during a training phase and in a subsequent phase compares new samples with the model. Because SDO does not require recalling old data samples again, this algorithm has a less severe computational load than other OD algorithms during the application phase. In fact, the computing complexity of SDO is

O (n k) | n \to inf = O (n)

, where n is the number of samples in the data set and k is the number of observers [26].

The assumption for the application of SDO to mitigate poisoning attacks to ML algorithms is that poisoned samples will be distanced from the main LOS and NLOS clusters. The additional advantage of SDO is the low time complexity, which is an advantage for wireless communication data sets, which are usually composed of many samples (e.g., the eWINE data set used in this study is composed of

6000 \times 7

scenarios for 42,000 samples). To the best of the author’s knowledge, SDO has never been applied to poisoning attacks in ML in general and for LOS/NLOS classification in particular.

The IF algorithm was initially introduced in [27] and has been extensively used by the ML research community since its introduction. IF is derived from the random forest algorithm and it generates an ensemble of trees on the data set given as input. The anomalies are those samples that have short average path lengths on the trees in comparison to the majority of samples. The IF algorithm is based on two main parameters: the number of trees used to build the tree and the sub-sampling size. Similar to SDO, IF has a linear time complexity with a low constant and a low memory requirement. In particular, if the sub-sampling size is termed

ϕ

and the number of trees t (the input parameters of the IF algorithm), the computing complexity of IF for the training phase is

O (t \times ϕ \times l o g ϕ)

and it is

O (n \times t \times l o g ϕ)

for the testing phase where n is the size of the testing portion of the data set.

The IF algorithm was chosen in this study because it was used in other studies [29,30] mitigating poisoning attacks in ML but not for LOS/NLOS classification.

The OCSVM algorithm is an extension of SVM for one-class problems and its use as an OD algorithm was shown in [28]. One-class SVM is an anomaly detection algorithm that is based on the concept of separating data from the origin in the transformed high-dimensional predictor space using a kernel. This study uses the Gaussian kernel to identify the decision boundary to generate a maximum-margin hyperplane that separates the samples in training data from the origin in the feature space. In OCSVM, the outlierness score is estimated as the solution of the quadratic optimization problem of the related soft-margin SVM formulation. As the OCSVM model is trained by solving a QP problem, the computational complexity may be

O (n^{3})

, where n is the number of samples in the data set, even if this study applies the sequential minimal optimization (SMO) algorithm, which is shown to provide relatively lower complexity in [31].

OCSVM is used in this study because it was proposed to mitigate ML poisoning attacks (as OD algorithm) in other studies like [32] even if these studies were not specifically focused on LOS/NLOS classification.

3.4. Evaluation Metrics

The evaluation metrics used in this study are of two types. The first type evaluates the capability of the OD algorithm to detect the poisoned samples in the poisoned model and it is expressed as follows:

P_{d} = \frac{N_{O D} (T h_{P})}{N_{P} (T h_{P})}

(1)

where

N_{O D} (T h_{P})

is the number of samples with the highest outlier score from the OD algorithm for a specific value of the poisoned percentage

T h_{P}

.

N_{P} (T h_{P})

is the number of poisoned samples in the data set for a specific value of the poisoned percentage

T h_{P}

. As described before,

N_{P} (T h_{P})

is simply

S_{T r a i n} \times T h_{P}

, where

S_{T r a i n}

is the size of the training set (i.e., 2250 samples).

The second type is the performance of the classification to distinguish LOS from NLOS samples. It is estimated using three metrics: accuracy, precision, and recall. Accuracy is expressed as follows:

A c c u r a c y = \frac{T P + T N}{F P + F N + T P + T N}

(2)

Precision is expressed as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(3)

Recall is expressed as follows:

R e c a l l = \frac{T P}{T P + F N}

(4)

where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives.

In this specific study, the improvement or difference in performance between the sanitized and the original poisoned model is even more relevant because it measures the improvement in accuracy/precision/recall of the OD algorithm due to sanitation, which is in turn based on how well the OD algorithm is able to select the legitimate samples and discard the poisoned ones. These metrics expressed in accuracy/precision/recall are simply the following:

A c c u r a c y_{I} = A c c u r a c y_{S} - A c c u r a c y_{P}

(5)

P r e c i s i o n_{I} = P r e c i s i o n_{S} - P r e c i s i o n_{P}

(6)

R e c a l l_{I} = R e c a l l_{S} - R e c a l l_{P}

(7)

where

M e t r i c_{S}

is the metric of the ML algorithm on the sanitized training portion of the data set and

M e t r i c_{P}

is the metric of the ML algorithm using the poisoned training portion of the data set. The metric can be accuracy, precision, and recall.

3.5. Classifier Algorithm

The RaF algorithm is used as a classifier both on the poisoned data set and the sanitized data set. While other ML algorithms could be used, there are already a significant number of hyper-parameters in this study and different OD algorithms; so a single ML algorithm was chosen to minimize the complexity of the analysis. The RaF algorithm was selected due to its scalability (the set is composed of thousands of elements), its high accuracy, robustness to outliers, and because it has shown better robustness and performance than other machine learning algorithms in adversarial attacks [33] and has consistently outperformed other ML algorithms, especially at higher rates of poisoning [34].

The RaF algorithm used in this study is based on gentle adaptive boosting algorithm with a learning rate for shrinkage equal to 0.1 and decision trees as weak learners. The computing environment is MATLABR2023b on a hardware platform with an Intel(R) Core(TM) i9-10885H, a CPU with a clock of 2.40 GHz, and 32 Gbytes of RAM.

4. Data Set

Two different data sets were used to evaluate the proposed approach. The first data set is the public eWINE data from [35], which was chosen because it was recently adopted by other researchers for ML classification of LOS/NLOS channel conditions [36,37]. It has a rich set of samples (6000) for each of the seven scenarios included in the data set. The second data set is a radar data set created by the author and used in [38] wherein a weather radar pulse signal is submitted to different fading conditions. To be consistent with the eWINE data set, two fading conditions were used: LOS (no fading) and NLOS.

The eWINE data set was created using the SNPN-UWB board with the DecaWave DWM1000 UWB radio module (DecaWave has now been acquired by QORVO) by transmitting UWB pulses in seven different scenarios: Office 1, Office 2, a small apartment, a small workshop, a kitchen with a living room, a bedroom, and a boiler room. In each scenario, 3000 LOS samples and 3000 NLOS samples were taken by using one DecaWave DWM1000 UWB radio module as an anchor and a second one as a tag. Only the CIR data were used in this study and no other information was used, as in [36], to make it a purely ML problem.

The radar data set (called Radar in the rest of this paper) was created by the author using a weather radar pulse signal, which was submitted to a fading condition inspired by the 3GPP tap-delay-line (TDL) configurations defined in the 3GPP standard [39], but which have been customized with a Nakagami-m fading distribution (3GPP-like fading models). As described more in detail in [38], a signal generator was used to generate a weather radar test signal defined in [40] with a sampling frequency of 40.00 MHz and a pulse width of 1 microsecond. The carrier frequency for the generation of radar pulses was set to 5650 MHz. Then, the pulse signals were transmitted through a channel emulator that emulates two different conditions: the LOS no fading condition and the NLOS TDL-C from [39] with a Nagakami-m fading model with values of m = 0.5, 1, 2, 3, 4. For each condition, 3000 samples were created (600 samples for each of the five values of m of the Nagakami-m fading model) for a total of 6000 samples. Only two conditions were selected to be consistent with the eWINE data set.

As described in the subsequent section of this paper, the training portion of the data set (which is poisoned) was three-fourths of the entire data set while one-fourth was reserved for the classification step. This distribution was used for both data sets.

5. Results

This section provides the results of the proposed approach on the eWINE data set and the Radar data set. Considering that there are a number of hyper-parameters in this study and different scenarios in the eWINE data set, the analysis cannot be exhaustive for all the scenarios but it will mainly focus on the first scenario (Office 1) (indicated as scenario 1 (Office 1)) in this paper, even if a comparison of the different scenarios will be also provided. For the Radar data set, only one scenario is present. As described in Section 3.2, the term LF is used to identify the label flipping attack, the term FSP is used to indicate the attack where the feature space is perturbed by the factor

S_{P}

, and the term TFP is used to indicate the attack where the perturbation is applied to the label template.

Table 1 summarizes the hyper-parameters and the range of values considered in this study for both data sets. As described before,

T_{P}

is the hyper-parameter defining the percentage of the poisoned samples on the entire training data set. This parameter has a range from 0.1 (i.e., 10% of the data) to 0.5 (i.e., half of the data). This range is similar to the one adopted in similar poisoning attacks in the wireless communication domain [19,20].

The results were obtained using a four-fold partition of the entire data set, where three-fourths of the data set was used to create the training data set on which poisoning attacks were implemented and one-fourth of the data set was used for testing. The final results were obtained by averaging the results obtained for each partition.

This section is structured in two different sub-sections: Section 5.1 for the eWINE data set and Section 5.2 for the Radar data set.

5.1. eWINE Data Set

This section provides the results for the eWINE data set. Section 5.1.1 shows the findings for the detection accuracy, while Section 5.1.2 shows the findings for the performance improvement between the sanitized and poisoned data set measured with

A c c u r a c y_{I}

,

P r e c i s i o n_{I}

, and

R e c a l l_{I}

.

5.1.1. Detection Rate of the Poisoned Samples

This sub-section provides the results related to the metric measuring the detection performance of the poisoned samples. As described before, the metric is the ratio between the number of samples detected by the OD algorithm against the number of poisoned samples. The metric is reported as

P_{D e t}

.

Figure 2 and the related subfigures show a comparison of the detection rate

P_{D e t}

for the three attacks (LF, FSP, and TFP) and the three different OD algorithms for different values of

T_{P}

. It can be noted from Figure 2a that SDO and IF have similar values of

P_{D e t}

for values of

T_{P}

lower or equal to 0.3. Meanwhile, for the FSP and TFP attacks, in Figure 2b,c, SDO has a much larger

P_{D e t}

than IF and OCSVM, which justifies the focus of this paper on the SDO algorithm. The potential reason for the high performance of SDO is due to the design of this algorithm, which focuses on the identification of the low-density areas in the multi-dimensional space. The effect of the FSP and TFP attacks is to lower the density of the samples because the inserted poisoned samples are outside of the high-density region. It can be noted that for increasing values of

T_{P}

, the OCSVM algorithm is able to increase the value of

P_{D e t}

but this is still lower than the values of

P_{D e t}

obtained with SDO, even if it surpasses the

P_{D e t}

obtained by the IF algorithm. It can also be noted that

P_{D e t}

obtained with SDO is roughly equal across the values of

P_{D e t}

, which provides a good degree of stability in the results.

Figure 2 described above provides a view of the impact of the

T_{P}

parameter. It is also important to evaluate the impact of

P_{S}

, which is shown in Figure 3 only for the SDO algorithm for reasons of space. It is noted that the results for LF are not presented here because

S_{P}

does not have relevance for the LF attack. As expected, the value of

P_{D e t}

increases for higher values of

S_{P}

because the perturbation is greater, which ‘moves’ the poisoned samples to less dense areas in the feature space, where the SDO algorithm is particularly suited to detect the anomalies, which are in most cases related to the poisoned samples. Overall, it can be seen that SDO manages to reach a remarkable

P_{D e t}

near 80% for

S_{P} = 16

. It can also be noted that the

P_{D e t}

is slightly lower in the TFP attack for lower values of

S_{P}

. This justifies the introduction of the novel and more challenging TFP attack in comparison to the classic FSP perturbation attack.

Due to reasons of space, most of the results presented so far are related to the scenario 1 (Office 1) in the eWINE data set. The subsequent figure, Figure 4, shows the detection rate obtained using the SDO algorithm across the different scenarios of the eWINE data set. Regarding the detection rate, the results are generally consistent across the different scenarios, which indicates that the proposed approach has a good degree of generalization.

5.1.2. Performance of the LOS/NLOS Classification

This sub-section provides the results related to the metric measuring the classification accuracy between LOS- and NLOS-related samples. An important aspect is related to the improvement in accuracy between the case where the sanitation step is applied and the case where this step is not applied and the poisoned model is used for classification. The first aspect related to the absolute accuracy obtained on the sanitized data is illustrated in Figure 5, where it can be seen that the improvement in accuracy is roughly equivalent among the three OD algorithms. The SDO algorithm is slightly better than the IF and OCSVM algorithm for the TFP attack and values of

T_{P}

greater than 0.4. These values are obtained with a sanitation percentage

P_{S A N}

of 80% (80% of the samples with the lowest outlier scores are selected). While it may seem counter-intuitive that SDO manages to achieve a high detection accuracy (as shown in the previous subsection) while the absolute classification accuracy is roughly equivalent, it should be considered that the two metrics are based on different sets of selected samples.

T_{D e t}

is based on the samples with the highest outlier scores (i.e., anomalies) while the sanitation is based on the lower outlier scores. Thus, the selected samples of the training data set are disjoint or they scarcely overlap (this is based on the value of the hyper-parameters).

Beyond the absolute accuracy, a more valuable indicator of the performance of the proposed approach is the improvement of accuracy between the application of the RaF algorithm on the sanitized training data and the poisoned training data.

Figure 6 shows the improvement in accuracy obtained with the SDO algorithm with

S_{P}

= 16 (for the FSP and TFP attacks) for the different attacks for different values of the percentage of

P_{S A N}

. It can be seen that a lower value of

P_{S A N}

provides the best improvement accuracy, especially for the FSP and TFP attacks. In particular, for

P_{S A N}

, the improvement of accuracy is over 10%, which is a remarkable achievement, while it is still in the range between 2% and 4% for other values of

P_{S A N}

and

T_{P}

. It should be noted that these values are achieved with the highest severity of the attack with

S_{P}

= 16. The approach has a better performance for the FSP and TFP attacks than for the LF attack. This may be due to the consideration that the perturbation is more likely to generate anomalies (which are then excluded in the sanitized training model) than the FL attack. There is also a trend across the three different attacks that higher values of

P_{S A N}

(i.e., 90% and 95%) are not very effective in excluding poisoned samples and in fact the improvement of accuracy can also be negative for a severe attack with

T_{P} = 0.5

. This is understandable because for such values of

P_{S A N}

, many poisoned samples are still likely to be present in the training data. On the other hand, potential benefits for the improving the accuracy of the application of OD as an instance selection tool should be considered, especially for high values of

P_{S A N}

.

To complement the previous results for accuracy, Figure 7c and Figure 8c and their related subfigures show the results for precision and recall, respectively.

Due to reasons of space, most of the results presented so far are related to scenario 1 (Office 1) in the eWINE data set. The subsequent Figure 9 shows the improvement of accuracy obtained using the SDO algorithm across the different environments of the eWINE data set.

To complement the previous results for accuracy, Figure 10 and Figure 11 and the related subfigures show the results for precision and recall, respectively.

5.2. Radar Data Set

This section provides the results for the Radar data set. As for the eWINE data set, this section is divided into Section 5.2.1, which provides the results on the detection rate of the poisoned samples, and Section 5.2.2 for the improvement in performance between the sanitized and the poisoned data set.

5.2.1. Detection Rate of the Poisoned Samples

Figure 12 and the related subfigures show the comparison of the detection rate

P_{D e t}

for the three attacks (LF, FSP, TFP) and the three different OD algorithms for different values of

T_{P}

in the Radar data set. The results are presented for

S_{P} = 16

and for three different OD algorithms. In comparison to the eWINE data set (where the SDO algorithm was significantly more effective in performance than the other OD algorithms), the difference among OD algorithms is more balanced for the Radar data set. For the LF attack, SDO is clearly superior to IF and OCSVM as shown in Figure 13a, but for the FSP and TFP attacks, each OD algorithm has strengths and weaknesses depending on the value of

T_{P}

. In particular, Figure 13b shows that for low values of

T_{P}

(

T_{P}

= 0.1 and 0.2), the IF and OCSVM are slightly better than SDO, while for higher values of

T_{P}

(

T_{P}

greater than 0.2), SDO is slightly better than IF and significantly better than OCSVM. Thus, a summary of the results of this Radar data set and the eWINE data set seems to indicate that the detection accuracy of each OD algorithm depends on the data set structure and data distributions but SDO has a strong detection performance while also taking into consideration its low computing complexity.

5.2.2. Performance of LOS/NLOS Classification

Figure 13c and the related subfigures show the accuracy obtained with different OD algorithms and the random forest classifier for the different attacks with

S_{P} = 16

and the Radar data set for the FSP and TFP attacks (for the LF attacks,

S_{P}

does not apply) with

P_{S a n} = 80 %

. The results are consistent with what was obtained with the eWINE data set: the accuracy is quite similar for different values of

T_{P}

across the three algorithms. While SDO has generally a strong performance, the IF algorithm manages to achieve the best accuracy in most of the values, even if the differences in accuracy are quite small in relative values (less or around 1%).

As for the eWINE data set, Figure 14c shows the difference in accuracy for the Radar data set obtained with the RaF algorithm on the sanitized data set and the poisoned data set using the SDO algorithm. This is one of the main goals of the study: to evaluate how OD algorithms and SDO in particular can be used to sanitize a poisoned data set for three different attacks. As in the case of the eWINE data set, the results shown in Figure 14c support well the stated goal. Figure 14a shows a positive increase in classification accuracy for the large majority of the values of

T_{P}

and

P_{S a n}

with the most significant improvements obtained with low values of

P_{S a n}

as in the case of the eWINE data set. This is due to the consideration that with a small value of

P_{S A N}

, the OD algorithm is able to retrieve the majority of the legitimate samples. When the value of

P_{S A N}

is larger, the OD algorithm also includes poisoned samples, thus lowering the classification accuracy. For the LF and TFP attacks, a decreasing trend is noted in Figure 14a,c for the increasing values of

T_{P}

: the classification accuracy tends to decrease with a higher presence of poisoned samples while this trend was not significantly present in the eWINE data set. This trend gets to the point that for

T_{P} = 0.5

, the improvement in accuracy is minimal or even negative, which seems to point out that the OD algorithms are not able to distinguish the legitimate from the poisoned samples for such a large value of

T_{P}

.

A possible reason is due to the impact of the poisoning attack on the shape of the signals in the data set, which is more severe in the Radar data set than for the eWINE data, and the OD algorithm is more challenged in identifying the legitimate samples (i.e., lower score obtained by the OD algorithm). On the other hand, this trend is less relevant for the FSP attack in Figure 14b.

To complement the previous results for the accuracy improvement shown in Figure 14, this study also presents the results for the precision and recall metrics in the Radar data set in Figure 15 and Figure 16, respectively. It can be seen that the values and trends of precision and recall are coherent with the accuracy ones presented before.

6. Conclusions and Future Developments

This paper has investigated the application of OD algorithms to the mitigation of three types of poisoning attacks in the context of the classification of UltraWideBand (UWB) Line-of-Sight (LOS) and Non-Line-of-Sight (NLOS) propagation scenarios in two different data sets. The three different OD algorithms were evaluated together with the random forest algorithm for classification. The results show that for this particular problem, the OD algorithm is effective in the following: (1) identifying poisoned samples to a significant degree—more than 70% for selected scenarios and values of the parameters; and (2) increasing the classification accuracy (e.g., surpassing 10% of increase for selected scenarios and values of the parameters) in comparison to the poisoned data. While the SDO algorithm demonstrates a strong detection performance, which is higher than the other two OD algorithms, the improvement in classification accuracy between the sanitized data set and the poisoned data set is relatively balanced across the three different OD algorithms. The approach was evaluated on two data sets (one public data set and another created by the author that is available on request) and using an extensive set of values of the hyper-parameters, which define the impact and severity of the poisoning attacks.

Future developments will expand this analysis to DL algorithms and poisoning attacks that are specific to that class of algorithm. In particular, poisoned samples may be generated using GANs (Generative Adversarial Networks), while OD algorithms could be applied to different representations of the signals (e.g., spectral domain) rather than using a feature-based approach like the one applied in this paper.

Funding

The author received no additional funding for implementing this study; this work was covered by the institutional budget of the JRC.

Data Availability Statement

This study uses the public eWINE data set [35]. The Radar data set is available on request from the author.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CIR	Channel Impulse Response
CNN	Convolutional Neural Network
DT	Decision Tree
DL	Deep Learning
FS	Feature Scrambling
GAN	Generative Adversarial Network
KNN	K-Nearest Neighbour
IF	Isolation Forest
LF	Label Flipping
LOS	Line-of-Sight
ML	Machine Learning
NLOS	Non-Line-of-Sight
OCSVM	One-Class Support Vector Machine
OD	Outlier Detection
PDF	Probability Density Function
RF	Radio Frequency
SDO	Sparse Data Observers
SVM	Support Vector Machine
TDL	Tap-Delay-Line
TSP	Targeted Feature Scrambling
VMD	Variational Mode Decomposition
UWB	UltraWideBand

References

Huang, C.; Molisch, A.F.; He, R.; Wang, R.; Tang, P.; Ai, B.; Zhong, Z. Machine learning-enabled LOS/NLOS identification for MIMO systems in dynamic environments. IEEE Trans. Wirel. Commun. 2020, 19, 3643–3657. [Google Scholar] [CrossRef]
Yang, M.; Ai, B.; He, R.; Shen, C.; Wen, M.; Huang, C.; Li, J.; Ma, Z.; Chen, L.; Li, X.; et al. Machine-learning-based scenario identification using channel characteristics in intelligent vehicular communications. IEEE Trans. Intell. Transp. Syst. 2020, 22, 3961–3974. [Google Scholar] [CrossRef]
Benedetto, F.; Giunta, G.; Toscano, A.; Vegni, L. Dynamic LOS/NLOS statistical discrimination of wireless mobile channels. In Proceedings of the 2007 IEEE 65th Vehicular Technology Conference-VTC2007-Spring, Dublin, Ireland, 22–25 April 2007; pp. 3071–3075. [Google Scholar]
Guvenc, I.; Chong, C.C.; Watanabe, F. NLOS identification and mitigation for UWB localization systems. In Proceedings of the 2007 IEEE Wireless Communications and Networking Conference, Hong Kong, China, 11–15 March 2007; pp. 1571–1576. [Google Scholar]
Zhang, J.; Liu, L.; Fan, Y.; Zhuang, L.; Zhou, T.; Piao, Z. Wireless channel propagation scenarios identification: A perspective of machine learning. IEEE Access 2020, 8, 47797–47806. [Google Scholar] [CrossRef]
Baldini, G.; Bonavitacola, F. Channel identification with improved variational mode decomposition. Phys. Commun. 2022, 55, 101871. [Google Scholar] [CrossRef]
Liu, Q.; Yin, Z.; Zhao, Y.; Wu, Z.; Wu, M. UWB LOS/NLOS identification in multiple indoor environments using deep learning methods. Phys. Commun. 2022, 52, 101695. [Google Scholar] [CrossRef]
Wang, Q.; Li, H.; Zhao, D.; Chen, Z.; Ye, S.; Cai, J. Deep neural networks for CSI-based authentication. IEEE Access 2019, 7, 123026–123034. [Google Scholar] [CrossRef]
Wang, Z.; Ma, J.; Wang, X.; Hu, J.; Qin, Z.; Ren, K. Threats to training: A survey of poisoning attacks and defenses on machine learning systems. ACM Comput. Surv. 2022, 55, 1–36. [Google Scholar] [CrossRef]
Huang, C.; He, R.; Ai, B.; Molisch, A.F.; Lau, B.K.; Haneda, K.; Liu, B.; Wang, C.X.; Yang, M.; Oestges, C.; et al. Artificial Intelligence Enabled Radio Propagation for Communications. Part II: Scenario Identification and Channel Modeling. IEEE Trans. Antennas Propag. 2022, 70, 3955–3969. [Google Scholar] [CrossRef]
Seretis, A.; Sarris, C.D. An Overview of Machine Learning Techniques for Radiowave Propagation Modeling. IEEE Trans. Antennas Propag. 2022, 70, 3970–3985. [Google Scholar] [CrossRef]
Xiao, Z.; Wen, H.; Markham, A.; Trigoni, N.; Blunsom, P.; Frolik, J. Non-line-of-sight identification and mitigation using received signal strength. IEEE Trans. Wirel. Commun. 2014, 14, 1689–1702. [Google Scholar] [CrossRef]
Muqaibel, A.H.; Landolsi, M.A.; Mahmood, M.N. Practical evaluation of NLOS/LOS parametric classification in UWB channels. In Proceedings of the 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA), Sharjah, United Arab Emirates, 12–14 February 2013; pp. 1–6. [Google Scholar]
Tabaa, M.; Diou, C.; El Aroussi, M.; Chouri, B.; Dandache, A. LOS and NLOS identification based on UWB stable distribution. In Proceedings of the 2013 25th International Conference on Microelectronics (ICM), Beirut, Lebanon, 15–18 December 2013; pp. 1–4. [Google Scholar]
Oualla, H.; Fateh, R.; Darif, A.; Safi, S.; Pouliquen, M.; Frikel, M. Channel Identification Based on Cumulants, Binary Measurements, and Kernels. Systems 2021, 9, 46. [Google Scholar] [CrossRef]
Yang, G.; Zhang, Y.; He, Z.; Wen, J.; Ji, Z.; Li, Y. Machine-learning-based prediction methods for path loss and delay spread in air-to-ground millimetre-wave channels. IET Microw. Antennas Propag. 2019, 13, 1113–1121. [Google Scholar] [CrossRef]
AlHajri, M.I.; Ali, N.T.; Shubair, R.M. Classification of indoor environments for IoT applications: A machine learning approach. IEEE Antennas Wirel. Propag. Lett. 2018, 17, 2164–2168. [Google Scholar] [CrossRef]
Luo, X.; Qin, Q.; Gong, X.; Xue, M. A survey of adversarial attacks on wireless communications. In Proceedings of the International Conference on Edge Computing and IoT: Systems, Management and Security, ICECI, Virtual, 22–23 December 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 83–91. [Google Scholar]
Davaslioglu, K.; Sagduyu, Y.E. Trojan attacks on wireless signal classification with adversarial machine learning. In Proceedings of the 2019 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), Newark, NJ, USA, 11–14 November 2019; pp. 1–6. [Google Scholar]
Zheng, T.; Li, B. Poisoning attacks on deep learning based wireless traffic prediction. In Proceedings of the IEEE INFOCOM 2022-IEEE Conference on Computer Communications, New York, NY, USA, 2–5 May 2022; pp. 660–669. [Google Scholar]
Tian, Z.; Cui, L.; Liang, J.; Yu, S. A comprehensive survey on poisoning attacks and countermeasures in machine learning. ACM Comput. Surv. 2022, 55, 1–35. [Google Scholar] [CrossRef]
Pitropakis, N.; Panaousis, E.; Giannetsos, T.; Anastasiadis, E.; Loukas, G. A taxonomy and survey of attacks against machine learning. Comput. Sci. Rev. 2019, 34, 100199. [Google Scholar] [CrossRef]
Kristensen, J.B.; Ginard, M.M.; Jensen, O.K.; Shen, M. Non-line-of-sight identification for UWB indoor positioning systems using support vector machines. In Proceedings of the 2019 IEEE MTT-S International Wireless Symposium (IWS), Guangzhou, China, 19–22 May 2019; pp. 1–3. [Google Scholar]
Li, W.; Zhang, T.; Zhang, Q. Experimental researches on an UWB NLOS identification method based on machine learning. In Proceedings of the 2013 15th IEEE International Conference on Communication Technology, Guilin, China, 17–19 November 2013; pp. 473–477. [Google Scholar]
Biggio, B.; Nelson, B.; Laskov, P. Poisoning attacks against support vector machines. arXiv 2012, arXiv:1206.6389. [Google Scholar]
Vázquez, F.I.; Zseby, T.; Zimek, A. Outlier detection based on low density models. In Proceedings of the 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore, 17–20 November 2018; pp. 970–979. [Google Scholar]
Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar]
Amer, M.; Goldstein, M.; Abdennadher, S. Enhancing one-class support vector machines for unsupervised anomaly detection. In Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, Chicago, IL, USA, 11 August 2013; pp. 8–15. [Google Scholar]
Wolf, S.; Gamboa, W.; Borowczak, M. Jangseung: A guardian for machine learning algorithms to protect against poisoning attacks. In Proceedings of the 2021 IEEE International Smart Cities Conference (ISC2), Manchester, UK, 7–10 September 2021; pp. 1–7. [Google Scholar]
Huang, S.; Bai, Y.; Wang, Z.; Liu, P. Defending against Poisoning Attack in Federated Learning Using Isolated Forest. In Proceedings of the 2022 2nd International Conference on Computer, Control and Robotics (ICCCR), Shanghai, China, 18–20 March 2022; pp. 224–229. [Google Scholar]
Platt, J. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines; Technical Report, Microsoft Research Technical Report; Microsoft: Albuquerque, NM, USA, 1998. [Google Scholar]
Paudice, A.; Muñoz-González, L.; Gyorgy, A.; Lupu, E.C. Detection of adversarial training examples in poisoning attacks through anomaly detection. arXiv 2018, arXiv:1802.03041. [Google Scholar]
Yerlikaya, F.A.; Bahtiyar, Ş. Data poisoning attacks against machine learning algorithms. Expert Syst. Appl. 2022, 208, 118101. [Google Scholar] [CrossRef]
Dunn, C.; Moustafa, N.; Turnbull, B. Robustness evaluations of sustainable machine learning models against data poisoning attacks in the internet of things. Sustainability 2020, 12, 6434. [Google Scholar] [CrossRef]
Bregar, K.; Hrovat, A.; Mohorcic, M. Nlos channel detection with multilayer perceptron in low-rate personal area networks for indoor localization accuracy improvement. In Proceedings of the 8th Jožef Stefan International Postgraduate School Students’ Conference, Ljubljana, Slovenia, 31 May–1 June 2016; Volume 31, pp. 1–8. [Google Scholar]
Jiang, C.; Shen, J.; Chen, S.; Chen, Y.; Liu, D.; Bo, Y. UWB NLOS/LOS classification using deep learning method. IEEE Commun. Lett. 2020, 24, 2226–2230. [Google Scholar] [CrossRef]
Musa, A.; Nugraha, G.D.; Han, H.; Choi, D.; Seo, S.; Kim, J. A decision tree-based NLOS detection method for the UWB indoor location tracking accuracy improvement. Int. J. Commun. Syst. 2019, 32, e3997. [Google Scholar] [CrossRef]
Baldini, G.; Bonavitacola, F. Nakagami-m Fading Channel Identification Using Adaptive Continuous Wavelet Transform and Convolutional Neural Networks. Algorithms 2023, 16, 277. [Google Scholar] [CrossRef]
3GPP. Study on Channel Model for Frequencies from 0.5 to 100 GHz. 2017. Available online: https://www.3gpp.org/ftp//Specs/archive/38_series/38.901/38901-e00.zip (accessed on 4 October 2021).
ETSI EN 301 893 V2.1.1 (2017-05); 5 GHz RLAN; Harmonised Standard Covering the Essential Requirements of Article 3.2 of Directive 2014/53/EU. ETSI: Sophia-Antipolis, France, 2020. Available online: https://www.etsi.org/deliver/etsi_en/301800_301899/301893/02.01.01_60/en_301893v020101p.pdf (accessed on 4 October 2021).

Figure 1. Set of procedures composing the workflow of the proposed approach.

Figure 2. Comparison of the OD algorithms for the detection rate of poisoned samples with scenario 1 (Office 1) and

S_{P} = 16

. The y-axis provides the percentage of poisoned samples correctly identified as such by the OD algorithm. On the y-axis, the value of the percentage

T_{P}

of poisoned samples over the overall samples is given.

Figure 2. Comparison of the OD algorithms for the detection rate of poisoned samples with scenario 1 (Office 1) and

S_{P} = 16

. The y-axis provides the percentage of poisoned samples correctly identified as such by the OD algorithm. On the y-axis, the value of the percentage

T_{P}

of poisoned samples over the overall samples is given.

Figure 3. Impact of the

S_{P}

parameter on the detection rate of poisoned samples with first scenario (Office 1) and the SDO algorithm. The y-axis provides the percentage of poisoned samples correctly identified. The x-axis indicates the value of the percentage

T_{P}

of the poisoned samples.

Figure 3. Impact of the

S_{P}

parameter on the detection rate of poisoned samples with first scenario (Office 1) and the SDO algorithm. The y-axis provides the percentage of poisoned samples correctly identified. The x-axis indicates the value of the percentage

T_{P}

of the poisoned samples.

Figure 4. Detection rate of poisoned samples with the SDO algorithm and the random forest classifier for the different scenarios of the eWINE data set;

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply).

P_{S a n} = 80 %

.

Figure 4. Detection rate of poisoned samples with the SDO algorithm and the random forest classifier for the different scenarios of the eWINE data set;

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply).

P_{S a n} = 80 %

.

Figure 5. Accuracy obtained with the different OD algorithms and the random forest classifier for the different attacks for scenario 1 (Office 1);

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply).

P_{S a n} = 80 %

.

Figure 5. Accuracy obtained with the different OD algorithms and the random forest classifier for the different attacks for scenario 1 (Office 1);

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply).

P_{S a n} = 80 %

.

Figure 6. Improvement of the accuracy

A c c u r a c y_{I}

with the SDO algorithm and the random forest classifier for the different attacks within scenario 1 (Office 1) with

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and different values of

P_{S a n}

.

Figure 6. Improvement of the accuracy

A c c u r a c y_{I}

with the SDO algorithm and the random forest classifier for the different attacks within scenario 1 (Office 1) with

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and different values of

P_{S a n}

.

Figure 7. Improvement of the precision

P r e c i s i o n_{I}

with the SDO algorithm and the random forest classifier for the different attacks for scenario 1 (Office 1);

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and different values of

P_{S a n}

.

Figure 7. Improvement of the precision

P r e c i s i o n_{I}

with the SDO algorithm and the random forest classifier for the different attacks for scenario 1 (Office 1);

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and different values of

P_{S a n}

.

Figure 8. Improvement of the recall

R e c a l l_{I}

with the SDO algorithm and the random forest classifier for the different attacks for scenario 1 (Office 1);

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and different values of

P_{S a n}

.

Figure 8. Improvement of the recall

R e c a l l_{I}

with the SDO algorithm and the random forest classifier for the different attacks for scenario 1 (Office 1);

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and different values of

P_{S a n}

.

Figure 9. Improvement of the accuracy

A c c u r a c y_{I}

with the SDO algorithm and the random forest classifier for the different scenarios of the eWINE data set;

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and

P_{S A N} = 80 %

.

Figure 9. Improvement of the accuracy

A c c u r a c y_{I}

with the SDO algorithm and the random forest classifier for the different scenarios of the eWINE data set;

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and

P_{S A N} = 80 %

.

Figure 10. Improvement of the precision

P r e c i s i o n_{I}

with the SDO algorithm and the random forest classifier for the different scenarios of the eWINE data set;

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and

P_{S A N} = 80 %

.

Figure 10. Improvement of the precision

P r e c i s i o n_{I}

with the SDO algorithm and the random forest classifier for the different scenarios of the eWINE data set;

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and

P_{S A N} = 80 %

.

Figure 11. Improvement of the recall

R e c a l l_{I}

with the SDO algorithm and the random forest classifier for the different scenarios of the eWINE data set;

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and

P_{S A N} = 80 %

.

Figure 11. Improvement of the recall

R e c a l l_{I}

with the SDO algorithm and the random forest classifier for the different scenarios of the eWINE data set;

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and

P_{S A N} = 80 %

.

Figure 12. Comparison of the OD algorithms for the detection rate of poisoned samples with

S_{P} = 16

and the Radar data set. The y-axis provides the percentage of poisoned samples correctly identified as such by the OD algorithm. On the y-axis, the value of percentage

T_{P}

of poisoned samples on the overall samples is given.

Figure 12. Comparison of the OD algorithms for the detection rate of poisoned samples with

S_{P} = 16

and the Radar data set. The y-axis provides the percentage of poisoned samples correctly identified as such by the OD algorithm. On the y-axis, the value of percentage

T_{P}

of poisoned samples on the overall samples is given.

Figure 13. Accuracy obtained with the different OD algorithms and the random forest classifier for the different attacks with

S_{P} = 16

and the Radar data set for the FSP and TFP attacks (for the LF attacks,

S_{P}

does not apply) with

P_{S a n} = 80 %

.

Figure 13. Accuracy obtained with the different OD algorithms and the random forest classifier for the different attacks with

S_{P} = 16

and the Radar data set for the FSP and TFP attacks (for the LF attacks,

S_{P}

does not apply) with

P_{S a n} = 80 %

.

Figure 14. Improvement of the accuracy

A c c u r a c y_{I}

for the Radar data set with the SDO algorithm and the random forest classifier for the different attacks within scenario 1 (Office 1);

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and different values of

P_{S a n}

.

Figure 14. Improvement of the accuracy

A c c u r a c y_{I}

for the Radar data set with the SDO algorithm and the random forest classifier for the different attacks within scenario 1 (Office 1);

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and different values of

P_{S a n}

.

Figure 15. Improvement of the precision

P r e c i s i o n_{I}

for the Radar data set with the SDO algorithm and the random forest classifier for the different attacks within scenario 1 (Office1) where

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and different values of

P_{S a n}

.

Figure 15. Improvement of the precision

P r e c i s i o n_{I}

for the Radar data set with the SDO algorithm and the random forest classifier for the different attacks within scenario 1 (Office1) where

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and different values of

P_{S a n}

.

Figure 16. Improvement of the recall

R e c a l l_{I}

for the Radar data set with the SDO algorithm and the random forest classifier for the different attacks within scenario 1 (Office 1) where

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and different values of

P_{S a n}

.

Figure 16. Improvement of the recall

R e c a l l_{I}

for the Radar data set with the SDO algorithm and the random forest classifier for the different attacks within scenario 1 (Office 1) where

S_{P} = 16

for the FSP and TFP attacks (for the LF attack,

S_{P}

does not apply) and different values of

P_{S a n}

.

Table 1. Hyper-parameters considered in this study and related ranges.

Hyper-Parameter	Description	Ranges	Notes
$T_{P}$	Percentage of poisoned samples	(0.1, 0.2, 0.3, 0.4, 0.5)
$S_{P}$	Severity of the FSP and TFP attacks	(0.25, 0.5, 1, 4, 16)	This parameter does not apply to the LF attack
$P_{S A N}$	Percentage of data set for sanitization	(0.6, 0.7, 0.8, 0.9, 0.95)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Baldini, G. On the Application of a Sparse Data Observers (SDOs) Outlier Detection Algorithm to Mitigate Poisoning Attacks in UltraWideBand (UWB) Line-of-Sight (LOS)/Non-Line-of-Sight (NLOS) Classification. Future Internet 2025, 17, 60. https://doi.org/10.3390/fi17020060

AMA Style

Baldini G. On the Application of a Sparse Data Observers (SDOs) Outlier Detection Algorithm to Mitigate Poisoning Attacks in UltraWideBand (UWB) Line-of-Sight (LOS)/Non-Line-of-Sight (NLOS) Classification. Future Internet. 2025; 17(2):60. https://doi.org/10.3390/fi17020060

Chicago/Turabian Style

Baldini, Gianmarco. 2025. "On the Application of a Sparse Data Observers (SDOs) Outlier Detection Algorithm to Mitigate Poisoning Attacks in UltraWideBand (UWB) Line-of-Sight (LOS)/Non-Line-of-Sight (NLOS) Classification" Future Internet 17, no. 2: 60. https://doi.org/10.3390/fi17020060

APA Style

Baldini, G. (2025). On the Application of a Sparse Data Observers (SDOs) Outlier Detection Algorithm to Mitigate Poisoning Attacks in UltraWideBand (UWB) Line-of-Sight (LOS)/Non-Line-of-Sight (NLOS) Classification. Future Internet, 17(2), 60. https://doi.org/10.3390/fi17020060

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

On the Application of a Sparse Data Observers (SDOs) Outlier Detection Algorithm to Mitigate Poisoning Attacks in UltraWideBand (UWB) Line-of-Sight (LOS)/Non-Line-of-Sight (NLOS) Classification

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Workflow

3.2. Threat Model

3.3. Outlier Detection Algorithms

3.4. Evaluation Metrics

3.5. Classifier Algorithm

4. Data Set

5. Results

5.1. eWINE Data Set

5.1.1. Detection Rate of the Poisoned Samples

5.1.2. Performance of the LOS/NLOS Classification

5.2. Radar Data Set

5.2.1. Detection Rate of the Poisoned Samples

5.2.2. Performance of LOS/NLOS Classification

6. Conclusions and Future Developments

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI