1. Introduction
The involvement of modern technology, such as the Internet of Things (IoT), blockchain, and artificial intelligence (AI), transforms the legacy Internet into the next-generation Internet, where everything is interconnected. The unprecedented proliferation of IoT recently has made every technology come closer and connect to improve the quality of life. Applications such as smart cities, smart grids, smart vehicles, and smart industries have potentially impacted the nation’s economy. Through IoT applications, the devices connected to the Internet can be monitored and controlled from remote locations, enabling users to access the systems from any location. It enables system automation with better connectivity and communication, which results in efficiency in the system operations and better productivity. However, IoT has a severe security flaw because it utilizes lightweight protocols (e.g., message queue telemetry transport (MQTT), wireless fidelity (Wi-Fi), and constrained application protocol (CoAP)) and error-prone communication channels to relay sensitive data [
1,
2,
3]. For example, IoT sensors deployed in the smart home system can be maneuvered by attackers, resulting in significant damage to the home and the living beings residing in it. Therefore, there is a stringent requirement for some ideal solution that can confront the security and privacy issues of IoT applications [
4,
5,
6].
Security enthusiasts and researchers across the globe have proposed several security solutions to overcome the aforementioned security issues of IoT applications (e.g., smart home systems). For example, the authors of [
7] proposed lightweight cryptography, which includes features like robustness, long-range data transfer, and an acceptable level of security. The lightweight algorithms are applied on intelligent IoT devices and analyze their performance on an open standard system called a long-range wide area network (LoRaWAN), which defines the communication protocol for low-power wide area network (LPWAN) technology. Similarly, the authors of [
8] proposed another lightweight algorithm called “elliptic curve cryptography” for securing the data communication between the nodes in an IoT infrastructure. The proposed technique has been analyzed with the conventional lightweight algorithms to determine which algorithm has the most efficient technique to secure data. In [
9], the authors proposed a Java-based encryption system to provide a more efficient security framework for the data stored on the cloud storage. The proposed approach combines the Rivest–Shamir–Adleman (RSA) and the data encryption standard (DES) algorithm to develop a synergized combination of the mentioned algorithms, thus strengthening the security of the data before storing them on the cloud. However, the cryptography algorithms do not provide the scope of automation and data immutability; moreover, with recent computing power, it is easier to break off the crypto cipher, thus degrading the performance of the smart home system. Further, the work proposed by [
10,
11,
12] used a conventional signature-based intrusion-detection system that minimizes the security issues of smart home systems. Nevertheless, the intrusion-detection system has to rely on a high volume of data, which is a significant challenge as the system has to handle data efficiently without introducing latency. Moreover, it has to strike a balance between the detection of correct intrusions and minimizing false alarms [
13,
14].
The sensors associated with the IoT application show data readings from the surrounding environment (e.g., the sensor of a water treatment plant collects data readings of chlorine levels), which is essential for each sensor to accomplish a shared task [
15,
16]. However, it has been observed from the literature that these readings are manipulated by adversaries or are mistakenly errored by the legitimate personnel of the IoT application. Such data are formally known as anomalies, and it is essential to detect and remove them to enhance the performance of the IoT application. Recently, the advent of AI algorithms has shown a remarkable improvement in detecting anomalies and enhancing the security issues of IoT applications. For example, Emmanuel et al. in [
17] present an AI-based solution comprising extreme machine learning techniques for classification tasks. In addition, a regression-based solution is also examined for anomaly detection in smart home systems. The authors have primarily focused on intrusion and anomaly detection on the Mozilla Gateway installed in their sensor network infrastructure. With modifications in the aforementioned hybrid model, the authors achieved significant accuracy in anomaly detection. Similarly, Sihai et al. in [
18] used ensemble techniques for anomaly detection in the smart home infrastructure. To overcome the issue of AI model overfitting, the authors combined the synthetic minority over-sampling technique (SMOTE) with ensemble machine learning models for better efficiency when the model was working with an unbalanced dataset.
Nevertheless, the AI algorithms are not secured from data integrity issues, where the attacker can target the IoT data to jeopardize AI learning. To resolve this issue, blockchain is a prominent solution that offers secure data storage. Researchers have explored multiple techniques to integrate blockchain technology into the smart home infrastructure to preserve data privacy [
19]. For example, to preserve the privacy of traditional smart home systems, the authors of [
20] proposed a homomorphic consortium blockchain framework to strengthen the security of the sensitive data in the infrastructure; the framework comprises an algorithm where the verification nodes are required to verify working nodes and transactions occurring in the network. The authors also introduce a new block data structure based on homomorphic encryption. They evaluated their proposed work using data availability, security, and robustness, in which it outperforms other existing state-of-the-art works. Similarly, the authors of [
21] introduced a private blockchain network using the received signal strength’s indicator-based trilateration to secure data privacy in the smart home infrastructure. The authors have proposed a three-layer intrusion detection system (IDS) to detect cyber-attacks in IoT networks. To track the sources of attack, the Kalman filtering method has been incorporated into the trilateration. The proposed system was tested on a physical setup to evaluate it with the existing systems.
However, the aforementioned approaches are lacking in terms of showing the amalgamation between AI and blockchain to strengthen the security of smart home systems. Motivated by the above-mentioned papers on anomaly detection and blockchain implementation in smart home systems, we propose a robust solution and framework where both the important components required for the security of the smart home systems from external threats have been incorporated and synergized. The proposed framework, in the case of any abnormal activity, generates an alert and simultaneously examines the threshold levels of the nominal data to prevent the system from any kind of failure. Further, the data are stored in a blockchain network for immutability, which records data in a method safe from any further attack.
1.1. Research Contributions
We proposed an AI- and BC-enabled secure framework to tackle network-related attacks on smart home systems. Since the IoT sensors deployed in the smart home systems use weak network interfaces and protocols, the attackers leverage this situation and exploit the sensor data exchange. Consequently, the susceptibility of these systems to cyber threats and unauthorized access is significantly heightened, posing serious security risks and underscoring the need for robust protective measures and advanced security solutions. To approach this challenge, the proposed work utilizes the standard smart home system dataset to train AI classifiers (such as K-nearest neighbor (KNN), support vector machine (SVM), linear discriminant analysis (LDA), and quadratic discriminate analysis (QDA)) to classify attack and non-attack data. Nevertheless, prior to classification, we employ anomaly-detection algorithms, such as local outlier factor (LOF) and isolation forest (IF), to remove falsified data from the original smart home system dataset. The rationale behind this is that if the AI classifiers are trained on falsified data, it deteriorates the AI training, which jeopardizes the operational performance of the smart home systems.
Further, we adopted the interplanetary file system (IPFS)-based Ethereum blockchain to confront data integrity issues. Here, the non-attack data from AI classifiers are allowed for secure data storage. For that purpose, a smart contract is designed, where different user-defined functions are utilized to validate the non-attack data. Incorporating IPFS improves the response time and the scalability of the blockchain network. The proposed framework is evaluated using different performance metrics, such as accuracy, lift curve, validation curve, and the blockchain’s transaction and execution cost. A training accuracy of 99.27% is achieved while finding the anomalies and 99.53% while classifying the attack and non-attack data. Further, due to the incorporation of IPFS, we achieved a scalability of 86.23% compared to the conventional blockchain.
1.2. Organization
The article is divided into sections, where
Section 2 showcases the literature review;
Section 3 introduces the main aim of the proposed work;
Section 4 presents the proposed framework comprised of cognitive, AI, blockchain, and application layers to achieve the aim specified in
Section 3; and the results and discussion are presented in
Section 5. Finally, in
Section 6, we conclude the article by providing the main insights of the proposed work.
2. Related Works
Various researchers worldwide have published the application of IoT in several domains, including smart home systems. However, these studies do not explore the possibilities of integrating AI and blockchain in resolving the security issues of smart home systems. For example, Cultice et al. in [
22] proposed an autoencoder-based system to detect anomalies in smart home systems. They primarily focus on applying neural network algorithms in smart home systems and implementing them to prevent hazards in the environment where they are installed. Further, Lee et al. [
23] present a blockchain-enabled secure solution to overcome security threats, such as device vulnerabilities and data integrity for the home gateway network. Their solution offers decentralization, immutability, and transparency to overcome the challenges associated with centralized systems. The framework proposed implements the developed blockchain network on the Ethereum platform. They assess their smart contracts using security response time and accuracy, where the results revealed that their designed smart contracts are more effective than the existing works.
Further, Hamed et al. in [
24] proposed a detailed, layered system architecture for the IoT infrastructure called “AI4SAFE-IoT”. The developed architecture comprises security protocols and machine learning in its different layers to confront various IoT-related attacks. Their proposed system successfully detects attributes and also identifies the stage of an attack life cycle based on the “Cyber Kill Chain” model. The authors evaluated the proposed architecture based on the “IoT service management” score, where they achieved considerable results. Then, Prarthi et al. in [
25] developed an anomaly-detection algorithm called “PiForest ”. They first surveyed the implementation of various anomaly-detection algorithms in several cases and calculated the accuracy of the implemented algorithms. The authors further proposed their own anomaly-detection algorithm and implemented it in a real-time scenario. The accuracy of the “PiForest” algorithm was also compared with the accuracy of other algorithms to determine the performance of the developed algorithm.
Similarly, Subhi et al. [
26] proposed an AI- and blockchain-based architecture to secure various IoT applications in the smart city. Their solution can automate certain tasks, such as environment monitoring, data aggregation, and data analysis. The analyzed data are forwarded to the AI expert engine for offering predictive services. Their experimental results show that the employed AI models achieved 95% accuracy. Further, they utilized blockchain to store the actual data after the classification task. Further, in [
27], blockchain was adopted for a secure natural gas transaction framework. In their framework, buyers and sellers interact with each other to purchase the gas contract and maximize their profit. However, the solution did not utilize intelligence and automation to classify the attack and non-attack data. Next, the authors of [
28] use the amalgamation of AI and blockchain to promote sustainable IoT by enhancing the security and privacy issues of the smart city. Alternatively, the work proposed by [
29] uses a learning engine for a smart home communication network that uses blockchain and cloud-based data evaluation to improve security. The proposed algorithm outperforms existing methods in terms of computation complexity, false authentication rate, and qualitative parameters.
It is also observed from the literature that most existing solutions do not amalgamate both AI and blockchain to strengthen the performance of smart home systems [
30]. For instance, they did not consider anomaly detection with classification. Further, their blockchain-based solutions are computationally expensive because blockchain has to process both attack and non-attack data. Further, the work proposed by [
31] uses differential privacy and the indispensable properties of blockchain to enhance security in smart home systems. Their results show outperformance regarding scalability, confidentiality, and resilience against data tampering. To analyze the feasibility of blockchain technology in the smart home system, Arif, Yiyang et al. in [
32,
33] examine the adaptability of blockchain by developing a consortium blockchain-based testbed. Only a few papers have shown an amalgamation of blockchain and AI specifically for smart home systems. For example, Ref. [
34] proposed a private-blockchain-based smart home network architecture that integrates an AI model for intrusion detection. Similarly, the authors in [
35] use AI and blockchain to propose a secure monitoring system for the COVID-19 outbreak. The aforementioned papers incorporate AI and blockchain for different applications and are not considered smart home systems. Moreover, most researchers have shown their significance in terms of survey or review papers [
36]. From that viewpoint, we propose a secure and intelligent framework for secure data exchange in the smart home environment by incorporating AI and blockchain technology.
In this context,
Table 1 displays the comparative analysis between the state-of-the-art works and the proposed work. Therefore, the proposed work offers a secure pipeline where the first anomalous data points are detected and eradicated from the smart home system dataset. Further, the employed AI models bifurcate attack and non-attack data, and only non-attack data are forwarded to the blockchain network for secure storage.
3. System Model and Problem Formulation
This section elaborates the system model for the proposed framework, which consists of different homes represented by
, and each
is equipped with various smart sensors, represented as
. Each smart home
has at least one
or a group of multiple sensors
deployed at various locations to offer smart home services.
Each
has a sensing capability, such as tracking the air quality, controlling the temperature of the freezer pipe, and sensing the motion. These sensing capabilities are the data readings of sensors denoted as
D, such that
. A source sensor
sends the aforementioned data reading
to the receiver sensor
to take essential action
. For example, if the freezer temperature rises to a certain threshold, an immediate alert is generated to lower the temperature. Moreover, to offer such services, each sensor has to exchange data with other sensors, wherein lowering the temperature is the specific action taken
.
where
is some specific threshold that the sensor’s data reading has to maintain; otherwise, a necessary action
is triggered in the smart home system. Moreover, the
uses a network interface (e.g., public internet, Wi-Fi, etc.) to relay
to the receiver sensor
. The Wi-Fi network is open to various network-related attacks, such as session hijacking, data integrity attacks, malware, and DDoS, that can deteriorate the performance of smart home systems. An attacker
can exploit the communication channel
and manipulate the data exchange
between the source sensor
and the receiver sensor
. In addition,
can also deploy a rogue sensor node
in the smart home system that acts as a man in the middle to maneuver the data exchange
of smart home systems.
where
is the manipulated data exchange between
and
. Therefore, there is a need for an automated and intelligent mechanism that can detect such malicious activity and resolve the security and privacy issues of the smart home system.
The main aim of the proposed work is to secure the data exchange between the source and the receiver sensor. For that purpose, an objective function
is formulated, which is defined as
where
is the data reading relayed between
and
.
4. Proposed Framework
This section presents the proposed framework for the IoT-based smart home system. The proposed framework has multiple layers, i.e., cognitive, AI, blockchain, and application layers, that provide a sequential flow, i.e., data acquisition from sensors, classifying the data (malicious or non-malicious), and securing them in the blockchain.
Figure 1 depicts the proposed framework with its associated entities. A summarized explanation of
Figure 1 is as follows.
4.1. Cognitive Layer
The cognitive layer consists of an IoT-based smart home system that comprises several smart sensors, such as thermostats, motion sensors, light, water leak, and smoke sensors. These sensors are capable of capturing the surrounding data (
d) to trigger a specific action associated with the event. For instance, if a water leak sensor detects any water leak in a sewage pipe, it triggers an alarm system.
where
,
, and
are the temperature, humidity, carbon monoxide, and butane sensors, respectively belonging to
S installed in the smart home system. The sensor (
) has a multitude of data points that pertain to its operational aspects, including sensor readings, updates, and maintenance records, which are expressed as
, where each
represents a specific piece of information in smart home systems. Further,
transmits
to another sensor
to accomplish a collaborative task (e.g., turn on the light, detect an open window, and many more).
The data collected by these sensors are vital and need to be secure from adversaries that try to manipulate it to degrade the performance of the smart home system. Moreover, an adversary
k can use a malicious sensor
that impersonates a legitimate sensor and jeopardizes the efficiency of different sensors deployed on the smart home system.
where
is the adversary that uses a malicious sensor
that sends the malicious data (payload
) to
. Moreover, the smart home system utilizes the public network, which is open to several attacks, such as sniffing the network traffic, session hijacking, and data integrity attacks [
37,
38]. The attacker can easily lure such open networks to thwart the data dissemination of the sensors attached to the smart home system.
Moreover, conventional solutions are not automated or intelligent enough to detect such malicious activities in the smart home system. Therefore, there is a need for a proactive mechanism that efficiently detects malicious activities in the smart home system.
4.2. AI Layer
In this section, we present the working mechanism of the AI layer by adopting different AI algorithms, such as KNN, SVM, LDA, and QDA. This subsection is divided into two parts, i.e., Dataset Description and Adoption of AI algorithms. A detailed explanation of each subsection is as follows.
4.2.1. Dataset Description
The cognitive layer collects malicious and non-malicious data from the smart home system. For this purpose, we used a standard smart home system dataset, i.e., the TON IoT dataset [
39], which comprises different IoT sensors, such as garage doors, refrigeration, weather, and motion sensors. The entire dataset is bifurcated into different service profiles, i.e., IoT fridge activity, IoT garage door activity, location tracker activity, thermostat activity, and many others. The dataset of the services describes the features of the activity, such as “fridge temperature” in the fridge activity, “latitude”, and ”longitude” in the location tracker activity, and many other relevant features in the other service profiles. From [
39], we acquired multiple datasets of smart home systems. For instance, a dataset of garage door
, fridge activity
, GPS tracker
, motion activity
, and weather
. Therefore, a smart home system dataset is represented as
. Each dataset
comprises the number of rows
and columns
, as represented in Equation (
12).
4.2.2. Dataset Preprocessing
In this phase, the dataset
is preprocessed using data preprocessing steps [
40,
41]. In
, there are inconsistencies, such as missing values, not a number (NaN), infinity values, not a normalized column, and datatype casting. Consider the dimension of
, expressed as
where
are the missing values,
∞ is infinity values and NaN is the value that is filled using the central tendency value, i.e., mean
.
Further, we analyzed the normalization of the dataset
, where the values of the
ith column of
are not scaled up properly, for example, the value of
or
. Therefore, normalization has to be performed on all columns of the dataset
. From that viewpoint, we utilized the min–max scalar, which is expressed as
where
is the rescaled output for
, which is in the range [0,1].
is the input value, and
and
are the minimum and maximum values of the
ith column of
. Further, the
has columns that are incompatible with AI models due to their datatype. For example, a conditional probability-based AI algorithm cannot adopt the column with an object datatype. Hence, a suitable datatype conversion has to be performed on
.
Here, in Equation (
16), an explicit datatype casting has been performed so that the AI algorithms can train on the dataset
. The final preprocessed dataset is represented as
.
4.2.3. Anomaly Detection and Classification Task
Once the dataset is preprocessed, it is forwarded to the AI layer, where different AI models are employed for anomaly detection and classification purposes [
42,
43]. Here, the preprocessed dataset
is split into the training and the testing datasets to validate the parameters of the trained model.
The terms and represent the training and testing parts of the preprocessed dataset . The dataset is split into a fraction of 0.8 (80%) and 0.2 (20%) for the training and testing, respectively, using the train_test_split() method. The validation of the model includes the multiple parameters through which its performance is analyzed. The model accuracy has been verified by re-iterating the model on the test data. Before classification on the dataset is performed, it is verified for anomaly detection, i.e., whether the attacker has manipulated the dataset or not. If an attacker has forged the dataset values, the AI models are trained on manipulated data and provide false results. As a result, it jeopardizes the performance of the entire smart home system.
In the AI layer, first, the anomaly-detection algorithms are iterated on the dataset to detect the behavior of the data, i.e., whether the data are anomalous or not. The algorithm detects the outliers or anomalies in the data and classifies them in the categories of anomaly and nominal data. Through model performance analysis, we found that IF is the best algorithm amongst other anomaly-detection algorithms that can efficiently detect outliers as an anomaly.
The algorithmic flow of IF is similar to the algorithmic flow of the random forest algorithm. The point of the tuple that is processed at the given point of time of model iteration will be segregated to find its behavior (anomaly or nominal). The number of divisions required to determine the location of that particular point or tuple is called an estimator. IF operates by constructing an ensemble of isolation trees. Each isolation tree is built by randomly selecting a feature and a random split value within the range of that feature. The feature and split value are used to partition the data into two subsets, which is known as random partitioning. This process is repeated recursively until each data point is isolated in its own leaf node. Once the tree is formed, as discussed above, the anomaly score of the feature value is calculated to determine the nature of that instance. The anomaly scores
can be formulated as
where
o represents the data point for which the anomaly score is being calculated. The term
is the average path length of the data point
o across all trees in the ensemble. Further,
is the normalization factor, i.e., the average path length along the isolation trees, where
s represents the total number of data points in the dataset. The term
is defined through the formula.
The structure of an isolation tree is the same as that of a binary tree. Thus, the
has been defined similarly to that of a binary tree, where each parent node has exactly two child nodes. The value obtained of the anomaly score
determines the behavior of the point. If the score is found near 1, it is classified as anomalous. If it is near 0.5, it is classified as a nominal point. The updated dataset
is the anomaly-free dataset, with only nominal data. However, it is to be noted that the
still has attack and non-attack data, where the attackers have performed various network-related attacks to maneuver the performance of the smart home system. Therefore, classification algorithms are needed to classify the data (attack or non-attack) in
. Supervised learning algorithms are implemented and tested using various performance metrics to classify the data. From the result analysis, we can know that the KNN algorithm performs well compared to other existing AI models. The performance metrics of the iterated models are briefly discussed in
Section 5. The algorithm classifies the data point through the distance metric and the number of neighbors defined in the algorithm. There are multiple available distance metrics, including Euclidean, Manhattan, and Minkowski. The Euclidean distance metric is implemented here for the classification in the iterated KNN model. The Euclidean distance between two data points can be formulated as
where
d is the calculated distance between two data points
p and
q in the dataset. The Euclidean distance of the data point selected and the number of its neighbors is determined. The nearest neighbor
n is selected, and on the basis of the highest behavior found of the selected number of neighbors, the behavior of the selected data point or tuple is determined. The algorithmic flow of the KNN model is shown in Algorithm 1.
Algorithm 1 Working mechanism of the KNN algorithm |
Input:
Output: Classification of anomalous and nominal data 1:procedureCLASSIFICATION(C) 2: Dataset KNN 3: Select number of neighbors n. 4: Select the distance metric. 5: Calculate the distance through distance metric. 6: Find the nearest neighbors. 7: if then 8: Classify as an attack. 9: else 10: Classify as non-attack. 11: end if 12:end procedure |
The terms and play a crucial role in our analysis, as they define the probability of encountering attack and non-attack data, respectively, in the vicinity of the selected point. These probabilities, denoted as and , are determined through the KNN model. represents the likelihood of encountering attack-related data points near the selected location, while signifies the probability of finding non-attack data points in the same vicinity. In essence, these probabilities are derived from the KNN model, which, based on its training data and distance metrics, estimates the chances of a given point being associated with either an attack or non-attack scenario. By utilizing the KNN model’s predictive capabilities, we can assess the risk associated with a specific location or data point, helping us make informed decisions in the context of security or anomaly detection. In post-classification, the behavior of the model is inspected, where if it is found to be an attack, the proposed system generates an alert. Otherwise, if the behavior is found to be non-attack, the data is stored in the blockchain network described in the blockchain layer.
4.3. Blockchain Layer
In this layer, the non-attack data from the AI layer is forwarded for secure storage. Formally, the non-attack data of a smart home system will be stored in a buffer space or web storage, where an attacker can perform several security attacks, such as data manipulation and data injection. For that reason, secure storage is required, which is transparent and can tackle data integrity issues. Blockchain technology is a prominent solution to this issue, where we designed a smart contract that validates the incoming non-attack data. For that purpose, we designed a smart contract in the Remix development environment, comprising functions such as addauthorized(), changedevicestate(), removeauthorization(), and currentdevicestate(). The incoming non-attack data from the AI classifier are validated using these smart contract functions. The smart contract is attached with an on-site file system storage, i.e., IPFS, which allows the data to be stored in their secure storage systems. For that purpose, a Filebase application programming interface (API) is used that programmatically interacts with IPFS. The aforementioned smart contract functions take the data as a parameter and forward them to the IPFS. Once the validated data from the smart contract are uploaded to IPFS via Filebase, a unique content identifier (CID) is received to retrieve the content later.
Additionally, the IPFS computes the hash of the original data and forwards the hash to an immutable blockchain ledger. Here, we used an Ethereum-based public blockchain to obtain benefits such as transparency, decentralization, and immutability. As all entities of the smart home system have to register with the blockchain network, it makes the blockchain network transparent. Due to the blockchain’s transparency property, one can find the entity that has performed the data manipulation, hence improving the security and privacy of the smart home system. Further, the data can be fetched from the IPFS node by computing its hash. If the computed and stored hash are the same, we can infer that the data are not manipulated; otherwise, we can simply discard that data and find the adversary behind this act. The entire smart contract and IPFS are deployed in a Sepolia-based test network to analyze the performance of the blockchain network.
4.4. Application Layer
The application layer receives the data from the blockchain layer, which is given as input to the other sensors available in the smart home system of that particular home in the cluster. If the nominal value stored in the network is found to be close to the predefined threshold values, the actuators present in the smart sensors will perform necessary actions to control the environment, preventing it from any possible hazardous scenarios in the system. Through the seamless coordination of sensor data and responsive actuation, the smart home sensors act as intelligent custodians, ensuring the safety and stability of the smart home system while minimizing risks and promoting operational efficiency.
Figure 2 shows a sequential flow of the proposed framework.
5. Analysis of Results
This section discusses the analysis of the results of the proposed architecture using different performance parameters, such as statistical measures (e.g., accuracy, precision, recall, lift curve, and validation curve). Additionally, we present the experimental setup and tools showing the tools, libraries, and software platforms used to develop the proposed architecture.
5.1. Experimental Setup and Tools
The proposed architecture is developed using sophisticated tools, recent AI libraries, and open-source development platforms to write source code, train the AI algorithms, and visualize its performance. For that purpose, the anaconda distribution of version 6.3.0 is utilized, wherein the Jupyter Notebook is used to write the source code for data preprocessing, data modeling and training, and visualization. Further, different AI-based libraries, such as Python 3.8.8, Pandas, Numpy, Matplotlib, Plotly, and Pycaret, are utilized in the proposed work. The Pandas library is used for data manipulation and preprocessing using functions such as , , and . Next, the Numpy library is used for data computation, where the dataset is transformed into arrays for easy computation. We used the Pycaret library with user-defined functions for data modeling and training. Further, Plotly and Matplotlib were used for data visualization. For creating smart contracts, we utilize the Remix development environment with version 0.33.2. In Remix, we used solidity language with version 0.8.0 to design the smart contract. The smart contract comprises different user-defined functions—addAuthorizedDevice(), removeAuthorizedDevice(), changeDeviceState(), deviceState(), and authorizedDevices—that validate the non-attack data of the smart home system. These functions are compiled using a solidity compiler with version 0.8.18+commit.87f61d96. The proposed architecture is implemented on a system comprising 11th generation Intel(R) Core(TM) (i5-1135G7), 12 GB of random access memory (RAM), and an Intel Iris Xe graphic card. The system specification helps other readers to boost the training time and minimizes the processing time.
5.2. Discussion of Anomaly-Based Results
This section presents the results obtained for anomaly detection in smart home systems. Algorithms like LOF and IF are quite effective in detecting anomalies from real-world applications. For instance, LOF is a density and distance-based algorithm similar to the KNN algorithm, while IF is an ensemble method similar to random forests. The advantage of tree algorithms is that they offer essential benefits in finding anomalies in smart home systems.
Figure 3 illustrates the performance of the proposed framework in terms of the accuracy of detecting anomalies from the smart home system. The x-axis and y-axis represent the detection accuracy and the adopted anomaly-detection algorithms (i.e., IF and LOF) for the proposed framework. We used two different libraries to evaluate the performance of the detection algorithm: the IF algorithm from the SKlearn library (IF_SKL) and the IF algorithm from the Pycaret library (IF_Pycaret). From the graph, it is clear that IF (from IF_SKL) transcends the LOF, whereby IF (from IF_SKL) and LOF achieve 99.95% and 74.34%, respectively. Furthermore, the IF_Pycaret achieves 92.12% accuracy, which is better than the LOF. The hyperparameters play an essential role in lifting the model’s performance; in that view, LOF uses the "number of neighbors" parameter to achieve 74.34% accuracy. However, as the number of neighbors increases, the computational complexity of the model increases. LOF has a high computational complexity, i.e.,
, where
n depends on the number of data sizes. Moreover, we used hyperparameters (e.g., number of neighbors) that were to be used in each iteration, resulting in increasing the computation complexity from
to
, where
k is the number of neighbors. Contrary, the computational complexity of IF is
(without any hyperparameters), which is less than the LOF.
Figure 4 illustrates the number of anomalous and non-anomalous points detected by the anomaly-detection algorithms. The blue-colored bars in
Figure 4 represent the nominal points in the dataset. In contrast, the orange-colored bars in the graph represent the anomalous points in the dataset detected by the anomaly-detection algorithm. The IF algorithm gives the outcomes that are most accurate with respect to the results obtained from the dataset. The algorithm “IF” differed by three tuples in terms of anomalies from the original dataset, which resulted in a high accuracy. For the comparison of the tuples to determine the accuracy, the
function of numpy is used, which displays the number of tuples between the outcomes of two algorithms that differ with the selected attribute. The attribute chosen for the comparison is the additional column of behavior 7 of the instance or the “Anomaly” column.
5.3. Classification-Based Result Discussion
This subsection presents the results obtained through implementing different AI classifiers, such as NB, KNN, SVM, LDA, and QDA. Here, the classification algorithms are used to detect the behavior of the input value on every instance. We want to remark that the anomaly-detection algorithms are semi-supervised learning algorithms that are not preferred for the classification task. Thus, in such cases, AI-based classifiers prove to be more efficient for classification tasks as they lie in the supervised learning category.
Figure 5 illustrates the comparison of the accuracy obtained for the classification algorithms implemented on the dataset, which includes the class labels, i.e., Anomaly and nominal, which are depicted as “1” and “0”. The accuracy of an AI classifier is formulated as follows.
where
,
,
, and
represent the true positive, true negative, false positive, and false negative, respectively. The
x-axis of the graph represents the algorithm applied, and the
y-axis represents the accuracy of the applied AI algorithms. The train–test split function is applied to the dataset to evaluate and enhance detection performance. The model is trained on the training dataset and is tested on the remaining section of the dataset (i.e., the testing dataset). From the graph, it can be seen that the KNN algorithm gives the highest accuracy, which is 99.53%, compared to the other AI algorithms. This is because KNN is simple and relatively easy to implement; moreover, it does not need an explicit training phase, so the prediction for new data points is adjusted without retraining the model.
Figure 6,
Figure 7 and
Figure 8 illustrate other result parameters used to analyze the performance of the KNN algorithm.
Figure 6 shows the confusion matrix of the KNN algorithm. A confusion matrix is a statistical, matrix-based performance parameter used to summarize the performance of a classifier. The confusion matrix is built up using four features. These features are the evaluation parameters used to evaluate a classifier. The parameters integrated into the confusion matrix are as follows.
True Positive (): The true positive parameter represents the total number of positive outcomes that are correctly classified as positive with reference to the data.
False Positive (): The total number of negative outcomes that are incorrectly iterated as positive outcomes by the algorithm is the false positive category.
True Negative (): The number of correct negative classified outcomes falls under the true negative category.
False Negative (): The false negative parameter refers to the total number of incorrectly predicted negative outcomes that are supposedly positive outcomes.
The true positive, false positive, true negative, and false negative values were found to be 203, 13, 0, and 1705, respectively. Through these features, the accuracy, precision, and recall of the model can be calculated.
Figure 7 illustrates the lift curve of the KNN algorithm. It is observed that the x-axis represents the fraction of the sample of the data iterated data that corresponds to the lift, represented on the y-axis of the plot. The lift is calculated as the ratio of positive outcomes on the selected sample point, divided by the ratio of positive outcomes present in the whole iterated dataset. When the data available are ordered, the algorithm with the highest probability will appear on the left of the graph, along with the highest lift scores. A lift curve for a model can be defined as ideal when there are several real positive labels in a fraction of the number that has a very high probability of being positive. The model with the maximum lift is preferably considered as the better-iterated model. The lift curve for the model represented in
Figure 7 is near the ideal condition given the parameters for the analysis of the lift curve.
Figure 8 illustrates the validation curve for the KNN algorithm. A validation curve is a graphical performance metric to evaluate an iterated model based on the hyperparameters defined in the model. The validation curve and the training curve look similar to each other in an ideal condition. If both scores of the curves are established to be low, the iterated model is determined to be underfitting for the situation. The underfitting condition arises when too much regularization occurs or the model is informed by a few features in the condition. When the training curve reaches a higher score quickly in comparison to the validation curve, the model is established as overfitting for the condition. Further, the model can be evaluated for the overfitting conditions, i.e., if the lift curve shows a significant lift for anomalous instances compared to the baseline, it suggests that the model incorrectly captures or classifies anomalous data as nominal. This indicates overfitting and a lack of generalization to unseen anomalous patterns. The closer the nominal and anomalous curves are, the better the model’s performance.
The curve in
Figure 8 can be said to be the ideal condition, as both the curves are near to each other and present no overfitting or underfitting conditions. The non-attack data of the smart home systems are stored on different storage platforms for offering varied services to the smart home systems. However, it can be tampered with via data injection and manipulation attacks. Therefore, the proposed work adopts the indispensable characteristics of blockchain technology, where a smart contract is deployed on an Ethereum-based public blockchain.
5.4. Discussion of Blockchain-Based Results
The designed smart contract has various user-defined functions, such as addauthorizeddevice(), changedevicestate(), removeauthorizeddevice(), and devicestate() that act as a data validator that validates the non-attack data. In particular, the authorizeddevice() function includes a threshold-checking parameter. This parameter allows the smart contract to enforce a predefined threshold for authorized devices. The threshold could be a numerical value or a specific condition that needs to be met before a device is considered authorized. For example, the temperature sensor reading must lie within the threshold set by the regulatory bodies. If the sensor reading is out of the range, we invoke the removeauthorizeddevice() function to generate an alert and eliminate that particular device from the smart home system.
The purpose of this function is to ensure that only devices meeting the specified criteria can perform certain actions or access specific resources within the smart home system. By incorporating this threshold checking parameter into the authorizeddevice() function, the smart contract can enforce a more robust and secure authorization mechanism. This mechanism prevents unauthorized or potentially malicious devices from accessing sensitive data or performing unauthorized actions within the blockchain network. On successful validation, the non-attack data are forwarded to the IPFS-based secure storage.
Figure 9 shows the deployed smart contract and its different user-defined functions. For deployment, we used an injected provider–metamask environment that offers different test networks for smart contract deployment. We utilized the Sepolia test network to deploy the smart contract shown in
Figure 9.
When deploying a smart contract, there are two main costs to consider, i.e., transaction costs and execution costs. Transaction costs refer to the fees associated with interacting with the blockchain network to deploy a smart contract. These costs can vary depending on the blockchain platform being used and are typically paid in the native cryptocurrency of that platform, for example, Ethers in the Ethereum blockchain. Moreover, execution costs pertain to the computational resources required to execute the smart contract code once it is deployed on the blockchain. Here, we used an event log and struct to store the IPFS hash that has significant advantages in terms of gas consumption. Event logs are used to see the logged data that are not frequently retrieved. On the contrary, struct is used to enhance data retrieval; it organizes and stores data directly within the contract’s state, making it accessible for on-chain operations.
Figure 10 shows the transaction and execution cost incurred while deploying the smart contract in the Ethereum blockchain.
Further, we evaluated the performance of IPFS using scalability parameters. Since IPFS computes the hash of the legitimate non-attack smart home system data and forwards them to the Ethereum-based public blockchain, it improves the response time of the blockchain network. Response time is inversely proportional to the scalability parameter. Therefore, the lower the response time, the higher the scalability.
Figure 11 shows the scalability improvement when IPFS is employed in the blockchain network. As the response time improves, more transactions can be granted, increasing the blockchain network’s scalability.
6. Conclusions
The paper proposed a secure and intelligent framework to handle security threats associated with smart home systems. It is observed from the literature that due to the openness of the network interface, adopting weak protocols and lightweight encryption, the attacker can leverage these benefits and maneuver the performance of the smart home system. The proposed framework employs the automation and intelligent property of the AI algorithms, which are first trained to eradicate anomalous data from smart home systems. This is accomplished by training an IF algorithm that uses an ensemble approach to pinpoint and eliminate anomalous data points within the dataset with exceptional accuracy (99.27%). Once the anomalous data are eliminated, the AI classifiers are trained to classify attack and non-attack data. The proposed framework discards the attack data and only allows non-attack data to assist in enhancing the performance of the smart home system. Furthermore, to strengthen the security of the smart home system, the non-attack data are forwarded to the immutable blockchain nodes. For that purpose, we designed a smart contract in the Remix development environment that validates the non-attack smart home system data and deploys them on the Ethereum-based public blockchain. The smart contract is connected to the IPFS that stores the non-attack data. The IPFS computes the hash of the original non-attack data and forwards them to the blockchain’s immutable ledger. Storing the non-attack data of smart homes in blockchain nodes reduces the chance of data manipulation. The results show that the performance of the proposed framework is better than the existing state-of-the-art work. Here, the IF and KNN algorithms offer 99.53% and 99.27% accuracy in detecting anomalous and attack data, respectively. Moreover, the incorporation of the IPFS with the blockchain network improves the response time and scalability of smart home systems.
Adopting blockchain technology can degrade the latency and increase the mining cost. To respond to this challenge, we utilized IPFS and event logs to minimize the mining cost. However, we want to remark that the mining cost is still a persistent challenge and a significant limitation that necessitates careful consideration. In future work, we will utilize the proof-of-stake (PoS) and hybrid approaches, which aim to reduce energy consumption and lower the barriers to entry for participants. These innovations seek to strike a balance between security, decentralization, and cost-effectiveness, thereby making proposed work more sustainable and accessible. In addition, we will also incorporate the essential benefits of a 5G network interface to enhance the latency of the proposed framework.