1. Introduction
The proliferation of the Internet of Things (IoT) and Wireless Sensor Network (WSN) networks has revived an old yet serious form of attack—MAC-layer Spoofing or also referred to as Identity Spoofing. In MAC address spoofing attack, as the name suggests, a rouge wireless node masquerades as another legitimate device by cloning the legitimate device’s MAC address. Identity spoofing, in general, is a precursor for packet injection (another well-known type of attack) and thus requires careful consideration as part of any sound defense plan.
The most common way of defending against this form of attack is through the use of cryptographic techniques for MAC-address authentication [
1]. Unfortunately, due to the resource limitations that are inherently present in many IoT and WSN devices (e.g., low processing power, low memory capacity, and limited battery life), many of these devices operate with very scaled-down (if any) versions of encryption and authentication protocols. For example, it is discovered that due to ease-of-installation by non-technical consumers, Philips IoT Smart Bulbs do not employ any form of encryption and authentication as specified by 802.15.5 protocol standard [
2]. Or, in the case of a multihop WSN, the intermediate relaying nodes generally do not engage in the verification of the authenticity of the
relayed data frames—authenticity verification of these frames takes place only at the final (i.e., destination) node. Authentication by intermediate nodes is typically omitted in order to reduce the nodes’ energy consumption as well as minimize the possibility of a
battery exhaustion attack [
3] (readers should review seminal work by Nguyen et al. [
4] for a complete survey of energy depletion attacks against low power wireless networks).
It should be noted that in a number of standardized wireless protocols that are still in use today, cryptographic authentication is simply not intended for all stages/frames of a communication process. For example, in all variants of IEEE 802.11 preceding 802.11 w, only data frames are protected, while control and management frames are used without any protection [
5]. Thus, one should make provisions for extra security measures when cryptographic authentication is not supported by protocols deployed within certain application domains.
Clearly, in wireless systems with limited cryptographic and authentication protection, other alternative measures against MAC address spoofing are required. One such measure—which can also be used as an added layer of security even in wireless systems with extensive cryptographic and authentication protection—is the utilization of physical layer (i.e., signal-level) parameters.
Received Signal Strength Indicator (RSSI) is a wireless communication variable that is directly influenced by the transmission power and the location of the transmitter as well as different environmental variables such as obstacles. As suggested in a number of earlier research works (e.g., [
6,
7,
8]), RSSI values can be used to create the
fingerprint profile of each device in a wireless network and then deploy these profiles to do a preliminary authenticity check against MAC spoofing attacks. Another point that makes RSSI profiling an attractive ally against MAC spoofing attacks is that the use of this single real-valued physical-layer variable is easy to implement, requires no modifications to existing higher layer protocols and applications, and has a very small processing and memory footprint.
There have been many research works in the past investigating the use of RSSI profiling for the purpose of MAC spoofing detection (some of which are surveyed in
Section 2). Most of these works implicitly assume that: (1) RSSI samples received from a non-moving transmitting device form a stationary time-series with normally distributed variance, and (2) RSSI values are
independent and identically distributed (i.i.d.) samples from an unknown normal distribution. Moreover, in the given works, the act of profiling a wireless device based on its RSSI values strongly relies on these two very assumptions. However, in our recently conducted study, the two assumptions (RSSI samples are stationary and i.i.d.) have come under scrutiny. Namely, through our extensive real-world experimentation, we have observed that RSSI values measured by a receiving node are highly affected by changes (e.g., moving objects) in their operating environments. In particular, we have observed that moving human bodies (and their absence) have a noticeable effect on RSSI values of IoT devices deployed in a residential environment, and as a result the variance of the RSSI time-series changes significantly when occupants are present and move around the property—we call this effect
time-series clustering [
9] (refer to
Figure 1 where there are two different clusters, one with lower volatility than the other). Furthermore, it is clear from the depicted figure that there is a correlation between neighboring RSSI values; therefore, it would be hard to justify the claim that neighboring RSSI values are independent (as presumed by previous works [
6,
7,
8]).
Except in a few usage cases where there are no moving objects in the environment (e.g., farmland monitoring), most real-world IoT networks deploy computing/sensing nodes in environments with some number of movable objects. Thus, in order to account for changes in RSSI values due to the above described clustering effect, it is necessary to have an adaptive and/or multi-model RSSI-based profiling scheme that will be able to improve/reduce the rates of false positives (in our previous work [
10], we demonstrated how i.i.d. assumption pertaining to RSSI values can lead to probable evasion of detection systems that rely on RSSI-based profiling). In this work, we have proposed and studied a multi-classifier system to profile IoT devices based on their RSSI values under two moving object conditions (presence vs. absence of objects in the surrounding environment). Also, our profiling approach takes into consideration the relationship between neighboring RSSI values in the time-series to further improve the accuracy and robustness of IoT node profiles.
The content of this paper is organized as follows: In
Section 2, we discuss some of the notable previous works in RSSI-based MAC address spoofing detection. In
Section 3, we present the threat model and the main assumptions about the adversary’s capabilities as pertaining to our work. In
Section 4, we propose our LSTM-based (Long Short-Term Memory) profiling scheme that has been devised to detect and classify MAC-spoofing traffics. In
Section 5, we discuss adversarial traffic generation used to test the robustness of our approach and compare the effectiveness of our approach with the state-of-the-art RSSI based approaches previously proposed to deal with adversarial attacks.
2. Related Works
Wireless MAC Address Spoofing Detection is a well-studied topic in the literature on Wi-Fi and Wireless Sensor Networks. In the seminal paper [
11], Faria and Cheriton were among the first ones to propose the use of RSSI values as a fingerprinting variable to detect MAC spoofing attacks in a WLAN environment. As part of their detection model, it is assumed that there are multiple access points (APs) capable of receiving the wireless signals from all clients in the network, so the RSSI values measured at each AP’s antenna and for each transmitter are ultimately aggregated into a single profile. Consequently, a masquerading attack is detected by comparing the aggregated RSSI values of two consecutive data frames with the same MAC identifier. Also, they have demonstrated that using multi-sensing APs, and assuming constant transmitting power, a physical node can be triangulated with an accuracy of 5 to 10 m. Unfortunately, the practical merit of these findings is rather limited since the use of multiple overlapping APs in many WSN and IoT networks is not always possible.
Chen et al. [
12] and Wu et al. [
6] have both independently proposed the use of
k-means clustering algorithm to detect signal/frame spoofing by a rogue access point (AP). Their work is grounded on the assumption that the sequence of last
n RSSI values received from an AP would have minimum fluctuations around the mean in the absence of another rogue AP (i.e., an ‘Evil Twin’). Thus, when clustering the elements of a received RSSI sequence into two clusters using k-means algorithm in the absence of an Evil Twin, the distance between two formed centroids would be small (i.e., smaller than a threshold value). At the same time, a large distance between the centroids of the two formed clusters would be indicative of the existence of an Evil Twin AP with its unique RSSI distribution. However, since this approach does not involve any offline learning (i.e., a previously trained model of what should be considered a legitimate distribution), the MAC address spoofer and the legitimate node must transmit in relatively close time intervals for the detection to actually work.
Sheng et al. [
13] studied the effect of antenna diversity in 802.11 access points and their effect on RSSI device fingerprinting as well as spoofing detection. They demonstrated that RSSI values from a stationary receiver collected at a stationary transmitter form a mixture of two Gaussian distributions due to antenna diversity permitted under 802.11 protocol. As a result, they have trained a Gaussian mixture model for each wireless node and access point pair in the network and used a log-likelihood ratio test on the sequence of latest received RSSI at each access point from a given MAC address. A transmitting node is ruled spoofed if the ratio test fails by more than
n Gaussian mixture models—where n is smaller than the number of available access points in the network and needs to be set empirically. However, using available off-the-shelf hacking tools an adversary can easily manipulate its transmission power to evade detection by this model, as discussed in later sections.
Gonzales et al. [
14] have developed a novel technique known as context-leashing for the detection of public Evil Twin access points. They have argued that publicly available access points such as the ones available at franchise coffee shops (e.g., Starbucks) share service set identifiers (SSID) across different locations and oftentimes lack any authentication. This provides an opportunity for adversaries to spoof such SSIDs and trick clients into associating with a rogue access point (e.g., after performing a dissociation attack). The defense against the Evil Twin APs proposed in [
8] assumes the use of a so-called context-leashing engine. Upon association with a publicly available access point, the context-leashing engine would collect a list of context
, which contains the list of all visible SSIDs (denoted by
) and their corresponding average RSSI values (denoted by
) that is reachable at the time of association with a particular SSID in the environment. For any future reassociation with a given SSID, a new context list is constructed and compared to the previously stored one. If the context-list of available neighboring SSIDs and their average RSSI values does not have a significant (empirically defined) overlap with the historical context-list, then the associated SSID is deemed an Evil Twin and the connection should be terminated. The main drawback of their method is the assumption that the list of SSIDs in a given geolocation remains relatively unchanged over time. However, with today’s tethering capabilities of cellphones, this assumption is far from the truth.
3. Threat Model and Assumptions
In this section, we introduce the main annotation and assumptions of our work, which are also illustrated in
Figure 2. First, consider a simple setup where there are a legitimate transmitting node (e.g., a temperature sensor) denoted by
s and a legitimate receiving node (e.g., an IoT hub) denoted by
r communicating over a wireless channel. Also, we assume that
r utilizes an arbitrary approach (including what we propose in this work) to profile s based on RSSI samples, it has received in a period absent of any adversary, and then uses this profile at runtime to differentiate between received
data frames that carry s’ MAC address (legitimate vs. spoofed ones). Finally, let
denote the adversary with the following characteristics:
The adversary is situated at a location from which it can observe/receive signals transmitted by all legitimate senders (when sending data frames) and receivers (when sending acknowledgment frames back) in the given network.
The adversary is aware of the transmission power setting () of the legitimate sender(s), which is not a substantial assumption as system information about most IoT/WSN devices is publicly accessible on the Internet.
The adversary has no prior knowledge of the actual physical/geographic locations of other (legitimate) nodes in the network.
Network participants, including the adversary, are equipped with regular/common omnidirectional antennas, and are not capable of detecting the positional angle of the transmitting nodes. However, the adversary can move about in order to triangulate other nodes’ locations based on the strength of the signal received from those nodes [
10].
The adversary itself is an active node capable of adjusting its transmission power.
The adversary is also capable of altering (i.e., spoofing) its MAC address value—i.e., it can generate data frames that carry MAC addresses of other legitimate nodes from this particular network.
The ultimate goal of adversary
is to impersonate a particular
s by transmitting frames with
s’ spoofed MAC address. The spoofed frames are specifically intended for a particular
r. Since, according to the assumptions of our work, the transmitter’s RSSI values are registered and used by r for the purposes of MAC-spoofing detection, the adversary first needs to discover/adjust its transmission power (
) such that its spoofed frames (when received by
r) get accepted as genuine with a high probability—i.e., some desired probability of evasion is achieved by the adversary. This particular problem—of how to discover/adjust the transmission power so as to achieve a certain evasion probability—is closely related to the optimal adversarial evasion problem introduced by Nelson et al. [
15] and further extended by Madani and Vlajic [
10] to the IoT realm.
4. Detection Approach: Deep Authentication
As demonstrated in
Figure 1 (and argued in
Section 1), given that RSSI time-series values of a wireless IoT device are not i.i.d., one could incorporate dependencies among neighboring RSSI values to build more robust and accurate predictive models for the purpose of device authentication. Deep autoencoders are deep generative neural networks that have demonstrated a strong capability of modeling latent variables in anomaly detection and authentication datasets [
16]. LSTM autoencoders [
17], in particular, are known for their generative modeling capabilities on time-series data. In this section we present our novel technique for authentication of legitimate IoT nodes using RSSI-based anomaly detectors deploying LSTM autoencoders. In addition, expanding on our argument from
Section 1 with respect to the
time-series clustering-effect of RSSI values in dynamic environments, we also discuss how our novel multi LSTM autoencoder architecture is able to switch between multiple trained LSTM models at runtime. Such a multi-LSTM autoencoder architecture would help with addressing the clustering effect of RSSI time-series.
4.1. LSTM Autoencoder Anomaly Detector
In the context of our work, let
denote an ordered sequence of n RSSI values received by node s. Then, the LSTM autoencoder is trained to learn two functions, namely, encoder
and decoder
such that
. In other words, as depicted in
Figure 3, the LSTM autoencoder learns an encoding state that best describes the structure of the training/input data and a decoding function that reconstruct the input sequence given the encoding state with minimal error. In general, large reconstruction errors occur when the input does not conform to the structure previously learned by the LSTM autoencoder. As such, a large reconstruction error can be used as a measure of input anomaly [
16,
18,
19,
20].
In order to build an RSSI profile of s (through the use of LSTM autoencoder), the receiving node r begins the process of collecting and assembling a time-series of RSSI values extracted from the data frames transmitted by s. Then, using a rolling window of size n, the time-series is segmented into m different overlapping sequences (where the extent of the overlap is controlled by the shift constant of the rolling window), which are further used to train the LSTM autoencoder. Since the LSTM autoencoder is supposed to learn the reconstruction of the input sequences, the m training inputs are also supplied as the expected outputs to the training algorithm with the mean squared error (MSE) as the loss function.
At runtime (i.e., during the actual use of the trained LSTM autoencoder for the purpose of attack/anomaly detection),
n most recently observed RSSI samples are supplied into the trained LSTM autoencoder and then the MSE of the reconstructed sequence (relative to the provided input) is computed. Our experimental investigations (as described in
Section 5) have demonstrated that the MSEs of the training data, in the absence of attack/spoofed instances, form a normal distribution. Therefore, our system uses Z-score to measure deviation from the expected MSE as the decision function to differentiate between the spoofed and the normal traffic. Specifically, for a Z-score
the system declares the inspected RSSI window as malicious, where
l can be computed experimentally and set for the desired
false positive rate.
4.2. Multiclassifer and Model Switching
As discussed in
Section 1, in IoT environments with moving objects (e.g., residential or commercial premises), the RSSI time-series of a transmitting node can be divided into two significantly different time-series with substantially different volatility (i.e., time-series with clustering effect). Using the entirety of such a time-series (refer to
Figure 1) for the training of our system’s LSTM autoencoder would result in a less sensitive anomaly detection model. Thus, we propose to deploy/train two independent LSTM autoencoders—one for the volatile period of the observed time-series when moving objects are present, and one for the relatively calm period when the relative volatility is at its minimum.
Now, one obvious issue that would have to be adequately addressed in an anomaly detection system with two LSTM autoencoders is the issue of their scheduling. As one possible approach, the system operator could manually set the exact time when each of the trained LSTM autoencoders is to be deployed according to his/her knowledge of the environment. However, in such a system with manually determined ‘switch times’, a number of potential problems could arise. For example, an employee of a factory showing up earlier than usual could significantly affect the RSSI time-series of the nearby sensors/transmitters, which as a result could trigger a false positive alert (provided the detection model corresponding to the non-volatile conditions is still active).
One way to resolve the above challenges is by simultaneously monitoring MSE Z-scores output by the two models at runtime, and looking for the point in time when the Z-score of one of the models crosses another. For example, as shown in our experimentation and depicted in
Figure 4, at night where there are fewer moving objects in the environment, the night’s LSTM autoencoder model is reconstructing the RSSI time-series perfectly as reflected by its low Z-score, while at the same time the day’s LSTM autoencoder does a poor job in reconstructing the same RSSI time-series. However, during the transition period when moving objects start to appear in the environment, the night’s LSTM autoencoder performance starts to decline, while the performance of the day’s LSTM autoencoder (which is trained to cope with daytime volatility) starts to exhibit noticeable improvement with respect to the reconstruction MSE. Thus, the moment when the two Z-score time-series cross over each other would be the optimal point in time when the system should switch from using the nighttime to using the day-time LSTM autoencoder model. This suggests that by simply monitoring the output of both trained LSTM autoencoder models, it is possible to determine the optimal ‘switch time’ in an adaptable and automated manner.
6. Discussions and Conclusions
In this work we have proposed a novel RSSI-based MAC spoofing detection approach using a multi model LSTM autoencoder classifier. The advantages of our approach over earlier works in this field are twofold. First, our approach is capable of coping with periodic environmental (i.e., signal) disturbances caused by moving objects. Second, our approach can tolerate and detect presence of an adversary that transmits in close time intervals to legitimate network devices.
As part of this research, we have also studied the variability of RSSI streams in a real-world residential area, and (from the collected measurements) we have confirmed the existence of two very distinct periods in the observed RSSI streams (i.e., day vs night). These observations provide real-world justification for the use of a bi-modal LSTM autoencoder, with one autoencoder being trained for each variability period. In addition, we have proposed an automated and adaptive technique for determining the optimal point in time to switch between the two train models.
It may be worth clarifying that one of the key assumptions of our work is that the IoT network utilizing our solution is composed of a large number of sensing nodes (which are in charge of collecting and transmitting sensory readings from their immediate environment) and one or a few sink nodes (which are in charge of receiving and/or aggregating the sensory readings received from multiple sensor nodes). Furthermore, we assume that the sink nodes are generally more powerful (e.g., have better energy and processing capacity) compared to the sensing nodes.
Now, given the inherently ‘one-way’ nature of the assumed application and the respective communication patterns (i.e., sensors transmit while sinks receive), the most likely targets of an adversary existing in this environment (i.e., most likely recipients of spoofed packets) would be the sink nodes, and very rarely the ‘ordinary’ sensing nodes. Consequently, it is reasonable to assume that the proposed solution would have to be primarily, if not exclusively, implemented on the sink nodes in order help verify the authenticity of received sensor readings. As previously clarified, sink nodes are generally assumed to have reasonable energy and processing capabilities.
It is also worth pointing out that our proposed LSTM autoencoder approach is utilizing one-dimensional data (i.e., RSSI readings) as inputs, which makes the training of our model(s) extremely energy-inexpensive and fast, even for Zigbee IoT nodes as used in our experiments. Furthermore, using the trained LSTM autoencoders at runtime relies on very simple matrix multiplications, which are of similar complexity to SVM, linear regressions, or Gaussian models previously proposed in the literature, and which are well within the capabilities, even of IoT nodes, with limited energy and computational characteristics.
Given that most IoT networks have multiple participants, it is natural to wonder how our proposed method could be further expanded should participating nodes be capable and/or willing to cooperate with each other in order to detect an ongoing MAC spoofing attack. Although such an idea could likely enhance the overall detection and network performance, it also requires careful consideration and engineering in order to ensure robustness against (e.g.) potential byzantine nodes. We are planning an in-depth investigation of such a cooperative multi-node approach as one of the future research directions of our work.
In our previous work [
10], we proposed an RSSI-based randomization technique for protection against an active adversary capable of modifying its transmission power and its location in the target/victim environment. Of course, such randomization could positively affect our novel proposed method but the classification performance might change drastically under a randomized schema. Finally, in this work we have assumed that the system operator is in charge of detecting low vs high volatility periods in the training RSSI time-series and divided the training set into two subsets for training the proposed bi-modal LSTM autoencoders. However, one could argue that due to variability in RSSI during the presence vs absence of moving objects, it is possible to detect two periods (for separating the training datasets for building the multi-model classifiers) using unsupervised clustering approaches such as k-means instead of relying on the judgment of a system operator for creating such separation. This is certainly an interesting future work that can further enhance our proposed
crossover model switching indicator.