1. Introduction
The Wireless Sensor Network (WSN) is a wireless network in which sensor nodes cooperate to measure and monitor physical and environmental conditions. After performing the measurement task, the sensor nodes in the environment send the detection values to the base station with the help of cooperation [
1]. The data coming to the base station are also transferred to the internet environment, allowing users to follow the appropriate environment from where they are. An example of a WSN is given in
Figure 1.
The sensor nodes that make up the WSN consist of a sensing unit, a processing unit, a power unit, and a radio unit. Sensor nodes have three capabilities: computation, sensing, and communication. Sensor nodes consume the most energy during communication. An application’s number of sensor nodes may be hundreds or even thousands. Therefore, a sensor node should cost as little as possible. For the sensor nodes to be cost-effective, hardware is preferred accordingly, and therefore the sensor nodes’ processor, memory, energy, etc., is limited. There are three types of topologies in a WSN: star, tree, and mesh. Most topologies are mesh when considering real applications.
Moreover, the WSN types are Terrestrial WSN, Underground WSN, Underwater WSN, Multimedia WSN, and Mobile WSN. The requirements for each type are different. There are five layers in a WSN: Physical, Data Link, Network, Transport, and Application. On the other hand, Data Clustering is included in the Network layer because it is related to routing operations. Before developing a new protocol with a WSN, it is necessary to have a good grasp of the characteristics, challenges, and advantages of the WSN. Because a comprehensive security protocol that consumes much energy may be meaningless for a WSN, an energy-efficient but delay-insensitive routing protocol may be inefficient for a WSN. This information is given in the following section [
2].
The WSN characteristics: Large scale, Limited resource, Redundancy, Security sensitivity, Data-centric processing, High unpredictability, Real-time constraints.
Challenges of a WSN: Limited Functional Capabilities, Limited Energy, Network Lifespan, Scalability, Redundancy, Lack of global identification, Storage, Search, and Retrieval Production Costs, in- network Processing, Latency, and Fault tolerance.
Advantages of a WSN: Robustness to Withstand Rough Environmental Conditions, Ease of Deployment, Fault Tolerance, Ability to Cover Wide and Dangerous Areas, Self-Configurable, Mobility of Nodes, Unattended Operation, Improved Lifetime, and Improved Accuracy [
3].
A WSN has a wide range of applications, such as Home applications, Environmental applications, Industrial applications, Health applications, Military applications, and Commercial applications. While a WSN is one of the subsets of the Internet of Things (IoT), it is the most used technology in IoT systems.
The basis of smart systems in IoT is the WSN, that is, sensor nodes. Therefore, the WSN forms an essential part of IoT. A smart world can become possible with a WSN. There are three parts to developing an intelligent system. First, we can send any value measured from a sensor. In the second place, we can use any protocol to communicate this information. Finally, we have external systems to show these data. The raw material of all smart systems is the sensor nodes in a WSN. Smart Cities, Smart Environments, Smart Water, Smart Metering, Smart Agriculture, and Smart Animal Farming systems can be smart with a WSN. With it, a person can do whatever he/she needs with a WSN. In a WSN, data transmission of sensor nodes is a necessary process, which is very important today and will continue to develop within the IoT in the future.
Under normal conditions, the data detected by the sensor nodes are transmitted to the base station in the form of the hop by hop. However, data aggregation approaches are preferred in transmission to reduce the amount of data circulating in the network and for energy efficiency. For example, in Data Clustering, the detected data are transmitted to the collector node, and only the calculated value (sum, average, min, max, and count) is transmitted to the base station. This way, the amount of data circulating in the network is reduced, reducing the overall energy consumption.
Providing security and the ability to use energy effectively [
4] are essential for critical WSN applications. There are optimization problems, such as meeting the speed requirements of the users by minimizing the total transmission power [
5], increasing the service quality, and increasing the transmission power and speed of the system [
6,
7]. It can be seen as an optimization problem to use the limited energy efficiency while keeping the security high for critical WSN applications.
The DOS attacks are one of the most powerful cyber weapons [
8] against individuals and institutions. Therefore, many systems, including the Bayesian game theory-based mechanism [
8], and DNS rule-based mechanism [
9], have been proposed to reduce the impact of these attacks. At the same time, attack detections were made using techniques such as Machine Learning [
10], Adaptive Quantum Artificial Immunity System [
11], and Parallel Quantum Genetic Algorithm [
12] using datasets.
It is a practical issue to perform a secure data clustering that addresses confidentiality, integrity, and verification, as well as detecting and defending DOS attacks in an energy and cost-effective manner.
In this study, a comprehensive new secure data clustering protocol was developed. The following are the significant contributions of this paper:
- ■
Attributes of a suitable data aggregation protocol and security needs are mentioned.
- ■
A detailed literature review on secure data clustering is presented.
- ■
A new cluster selection process was carried out.
- ■
A cluster head assistant’s setup was created. In this way, a more remarkable survival of the cluster leaders is ensured.
- ■
Blowfish + EAX part of the SDA-RDOS was developed based on a new integrated data confidentiality, integrity, and authentication approach.
- ■
The RSA, preferred as Partially Homomorphic Encryption for this study, was used to ensure privacy and reduce the number of transactions during data clustering.
- ■
Neglected DOS attacks were taken into account in secure data clustering. A new DOS unit consisting of a detection and defense unit was developed.
- ■
Comprehensive security iwas provided for data clustering.
- ■
Energy Efficiency, Network Lifetime, Average Delay, Packet delivery ratio (PDR), and Security were the performance metrics used. Comparisons were made with LSDAR, SUCID, and OOP-MDCRP protocols, and more successful results were obtained.
- ■
A secure protocol was developed for critical WSN applications with high-security needs and consisting of many sensor nodes.
The features and security needs of a suitable data clustering protocol are explained in the second part of the study. The related studies are given in
Section 3, and the proposed protocol is introduced in
Section 4.
Section 5 includes the comparative experimental results of the developed protocol. While the discussion section is in
Section 6, the general results of the study are presented in
Section 7.
2. Attributes of a Good Data Aggregation Protocol and Security Need
There are three elements in Data Clustering: sensor node, aggregator node, and base station. The gray ones are the aggregator nodes, and the white ones are the sensor nodes (
Figure 2). In a large network, nodes can be displayed in a randomly distributed manner, as in
Figure 1. In WSN applications, nodes are typically distributed in this way. In a WSN with data clustering, communication takes place as follows. In a WSN, cluster head nodes are selected based on specific criteria. Each sensor node sends the value it detects to the cluster head node it is connected. The cluster head node collects the data it detects from the nodes connected to it with the selected transaction function and transmits it to the base station. If the cluster head node is located far from the base station, it transmits the collected data to the base station over other cluster head nodes.
Below are the points to consider regarding a suitable data clustering protocol [
13]. Energy: The most critical constraint in WSNs is energy because the WSN disappears when the energy of the nodes is depleted. Therefore, all work processes developed for the protocol must be energy efficient. In a WSN, Network Lifetime is directly proportional to the energy.
Latency: An essential criterion is latency for critical applications in WSNs. The work operations for the protocol must be delay sensitive.
Scalability: Depending on the application, the number of sensor nodes can be hundreds or even thousands. The designed protocol is expected to be effective in a WSN consisting of few or many nodes.
Packet Transmission Range: Packet transmission rates are expected to be high in WSNs. It can be said that a WSN works well when this ratio is high, otherwise, if this ratio is low, there is a problem with the WSN.
Communication Overhead: If communication overhead increases in a WSN, energy and bandwidth efficiency may decrease, therefore, additional control information should be used proportionately.
Data Accuracy: Evaluation of the ratio of the total number of readings received at the base station to the total number of readings generated.
Security: One of the most critical criteria in a WSN is security. As WSNs can be installed in uncontrolled environments, they may face a security problem anytime. Complete security is expected from some critical WSN applications such as military and health because an attacker who infiltrates the network can create security vulnerabilities that will put the country or people in trouble, steal critical information and use them maliciously. The following criteria are expected to be met for security in a WSN [
14].
Data confidentiality: The fact that the data are not sent openly means that the data are hidden. Therefore, the attacker who gets the data gets the secret data, not the open data; these data also mean nothing to him.
Data integrity: It ensures that data are not changed on the way from source to destination. For data integrity, message authentication codes can be used.
Data freshness: The freshness of the data is important. An attacker, copying old network data, may want to send it to the network at different times to mislead the WSN and deceive it. Therefore, it should be questioned whether the data are up to date.
Source Authentication: It is determined that the node communicating with authentication is from the same WSN. This way, nodes trying to join the network with a fake identity will not get any results.
Data availability: It means that a WSN is also efficient during DOS attacks. Although there is no process of seizing the environment due to DOS attacks, the efficiency of the network is considerably reduced. Therefore, the effect of DOS attacks should be reduced by developing detection-defense units. Examples of these attacks are the following [
15,
16,
17]. Here, DOS attacks that occur at the network layer are considered.
Sybil Attack: An attempt to control the network by creating multiple fake identities in the WSN. The malicious node identifies itself with multiple identities. As a result, the malicious node can constantly send false information to the network, leading to wrong decisions.
Figure 3 shows the illustration of a Sybil Attack.
Sinkhole Attack: The malicious node requests almost all the traffic from a given area through a node, creating a metaphorical pit between it and the base station. The illustration of a Sinkhole Attack is given in
Figure 4.
Blackhole Attack: In this attack, the malicious node identifies itself as the closest node to the base station and collects data from neighboring sensor nodes to be transmitted to the base station. The purpose of the malicious node is that the collected data does not reach the base station.
Figure 5 shows the illustration of a Blackhole Attack.
Wormhole Attack: In this attack, two malicious nodes establish a high-quality communication channel between each other. They then advertise this channel for routing, collecting data from neighboring sensor nodes. However, these nodes do not transmit the data either to the base station or transmit the data by changing it. The illustration of a Wormhole Attack is given in
Figure 6.
Selective Forwarding Attack: In this attack, the malicious node can accept and forward some of the incoming packets and reject other packets and drop the packet. The malicious node may/may not accept packets based on specific criteria. For example, it can forward all packets from a particular node and reject them from another node. The illustration of a Selective Forwarding Attack is given in
Figure 7.
Hello Flooding Attack: The malicious node sends a HELLO packet to the nodes in the WSN, thus keeping the neighboring nodes busy.
Figure 8 shows the illustration of a Hello Flooding Attack.
With all this, new approaches are required for security in data clustering because data privacy can be achieved by using a symmetric algorithm in a normal non-clustering WSN. Nodes in the network perform the detection, encrypt the data, and transmit it to the base station. In base station, it decodes the data and does what is necessary. Considering a clustered WSN, the sensor nodes detect, encrypt the data, and transmit it per cluster. The cluster head decrypts the data and collects it according to the transaction function, then encrypts it and transmits it to the base station. In the base station, it decodes the data and does what is necessary. As can be seen, the WSN, which includes data clustering, had to be encrypted-decrypted twice. As a result, data privacy was violated, and the need for additional time has arisen.
3. Related Works
This section presents current studies on secure data clustering in a WSN. In the study [
18], a privacy-protected data clustering protocol was developed for WSNs. In the developed protocol, the names PASKOS and PASKIS were given, considering whether or not the base station was involved in the process. It was stated that protocols reduce data loss and energy consumption and increase life expectancy. The study’s main purpose was to guarantee the confidentiality of the nodes participating in the clustering process. It was emphasized that the proposed solutions are vulnerable to DOS attacks. The main idea of the protocol was to obtain key values using the hash function and use them in the clustering process. The protocol was compared with the basic data clustering protocol TAG and gave successful results.
In the study [
19], a secure data clustering model supported by Fog was proposed for use in health systems. Encrypted data were communicated between the collector and the Fog server. Compression methods were also used to reduce the amount of data. Simulations were carried out in the Ns2 environment. Obtained results were better than TMT, GCEDA, and SPPDA according to Storage, Communication, Transmission Rate, Energy Consumption, and Endurance criteria.
The study [
20] proposed a secure clustering protocol for large-scale WSNs. It was stated that the cluster heads close to the base station performed more data-receiving, collecting, and transmitting operations. Therefore, in large-scale networks, it was stated that the cluster heads close to the base station consumed more energy than the cluster heads far from the base station. It was emphasized that the lifetime of the WSN will be reduced due to the load imbalance to which the cluster heads are exposed. In the study, a secure clustering protocol called HCR was proposed, taking into account load balancing and scalability. The proposed protocol divided the WSN into virtual circular layers. The Ant lion optimizer technique was also used in cluster head selection. The proposed protocol was compared with PSO-ECHS, PSO-C, and BERA in the MATLAB environment in terms of Network lifetime, energy efficiency, balanced clustering, efficiency, and better results were obtained.
The study [
21] recommended a safe clustering technique that prevents selective forwarding attacks. In the study, nodes were divided into three categories as Control Node, Head of Cluster, and Member Node. It was assumed that the control node knows all the job operations performed by the cluster heads. Therefore, it was desired to be protected with the control node in the clustering process as the cluster heads were the riskiest nodes. If the cluster head was attacked, all cluster elements lost their connection with the base station. In the proposed technique, there were two stages: detection and correction. The effect of selective forwarding attacks decreased with the proposed technique. The control node tries to detect the attack node using parameters such as data loss, delay time, and response time. The study was tested in an Ns2 environment considering criteria such as Packet loss rate, false detection rate, and energy consumption.
The study [
22] proposed a new model for efficient data processing and secure data clustering in a large-scale WSN. Homomorphic encryption was used in the study, and it consisted of three stages. The WSN was divided into clusters in the first step, and cluster heads were selected according to the fuzzy if-then rule. In the second stage, data confidentiality was ensured by homomorphic encryption. In the third stage, message authentication codes were used for data integrity. In current techniques, cluster heads receive the data, decrypt it, apply the defined function, and re-encrypt it to transmit the result. Therefore, more transactions are made, and more delays may occur. Therefore, homomorphic encryption, which does not need data decryption without causing communication delay, was preferred in this study. The study was tested in an Ns2 environment with good results.
In the study [
23], for the data clustering stages an algorithm that performs jamming detection for WSNs was proposed. The number of retransmissions of the nodes, the energy consumption per node, the time required for the network to return to a steady state, and the changes in the routing tables at the nodes were tracked for this. The study was implemented on the simulation platform with PEGASIS, TEEN, LEACH, and HPAR and on Zigbee and LoRa as the real environment. According to the test results, the node with abnormal behavior in the proposed algorithm can be detected with low power consumption.
The study [
24] proposed a heuristic approach for secure data clustering in a WSN. In real-time applications, it was stated that the sensor nodes were distributed in the environment. Moreover, it was stated that all nodes’ distances and transmission ranges were different because the nodes were randomly placed. In the study, an approach required by randomly distributed sensor nodes in different geographical regions in different distributed structures was carried out. In the developed approach, the WSN environment was divided into multiple virtual rings; each ring was divided into clusters. It was tested in the MATLAB environment and had good results.
The study [
25] proposed a data clustering technique that provides confidentiality and integrity using peer monitoring. The proposed technique considered privacy, flexibility, scalability, and efficiency criteria. Homomorphic encryption and lightweight key distribution technique were used in the study.
The study [
26] proposed a multidimensional secure clustering model for a WSN. In addition, the study proposed a trust management scheme using the binomial distribution and a secure transmission scheme considering the environment-distance-energy-security domains. The study was tested in the MATLAB environment, compared with LEACH, I-LEACH, and LEACH-TLC, with better results.
The study [
27] proposed a secure data clustering approach using the Fuzzy C-Means and ECC-ElGamal encryption algorithm. Fuzzy C-Means was used for cluster formation and cluster head selection, while the homomorphic encryption-based ECC-ElGamal encryption algorithm was used for data encryption. The proposed approach was shown to provide better security than existing ECC and RSA algorithms.
The study [
28] proposed a secure clustering model that ensures confidentiality and integrity using homomorphic encryption. The proposed approach consisted of four steps that were set up, encrypt-sign, collection, and verification. The proposed approach was developed on the TinyOS 2.0 simulator (TOSSIM) and PowerTOSSIM.
The study [
29] proposed a secure clustering approach in a hybrid structure. The study was based on the composition of star and tree structures. The network was geographically divided into four equal parts, with a star structure indicated in each section. Each node was assigned a parent node to transmit its data, then, the data were transmitted to the base station using the tree structure. Symmetric encryption was used in the study. The proposed method was compared with TMS, FSAMR, and EATSRA protocols in the Ns2 environment and tested according to energy consumption and data distribution latency criteria, and better results were obtained.
In the study [
30], a secure clustering approach was proposed for faster detection of security threats, considering many factors in a WSN. In the proposed approach, it was stated that the energy efficiency was increased, and the delay and calculation time was reduced.
The study [
31] proposed a secure clustering protocol based on QoS. Energy, network lifetime, and security were considered QoS parameters. In the proposed protocol, in the first place, temporary cluster heads were determined using an adaptive neuro fuzzy based clustering technique according to the criteria of residual energy, distance from the base station, and distance to neighbors. Then, among the temporary cluster heads, the most suitable ones were selected as cluster heads with the deer hunting optimization (DHO) algorithm. Finally, an intrusion detection system was developed, which uses a deep belief network to detect malicious nodes in the network. The proposed protocol was tested in MATLAB and compared with the IPSO algorithm, KHA, F5NUCP, and FUCHAR. It gave better results in terms of energy efficiency, network lifetime, packet delivery rate, average latency, and attack detection rate.
The study [
32] suggested a safe clustering technique with cluster head selection. Particle Swam Optimization and Water Wave Optimization were integrated and used in the proposed technique. To measure the performance of the technique, the number of live nodes, the coverage area, the energy balancing index, and the average remaining energy according to the number of laps were considered. It was compared with DICMLA and P-SMO and gave better results.
The study [
33] proposed a hybrid secure data clustering approach for WSNs. In the study, an optimal slice selection process was carried out to increase the network’s performance and ensure confidentiality. Fuzzy logic was used in this process. The study gave better results than PECDA and SMART as it was tested in a MATLAB environment, considering low communication load, energy efficiency, and safety criteria.
The study [
34] proposed a secure data clustering protocol that can detect malicious nodes in a WSN. The tree topology was taken as a reference in the study. In the proposed protocol, the threshold value was determined for each node; it was calculated based on the number of data packets transmitted and the number of successfully received data packets. According to the threshold value, the node was either in the normal or blocked list. In this protocol, it was checked whether each node was reliable. The study was tested in an Ns2 environment.
The study [
35] recommended a privacy-protected clustering protocol using the elliptic curve cryptosystem in a WSN. The study was tested in the MATLAB environment and compared with RIX-ECDLP, ESR, and LEACH-MAC according to processing time, packet delivery rate, energy consumption, average latency, network processing load, and network lifetime, and better results were obtained.
The study [
36] proposed a secure clustering approach using the Integration of Distributed Autonomous Fashion with Fuzzy If-then Rules algorithm. The cluster head can be selected by considering the energy, efficiency, and quality of the node. Packet delivery rate dropped packet rate, residual energy level, network life, and energy consumption criteria were tested, and successful results were obtained.
The study [
37] proposed a load-balanced and authentication-based clustering approach. In the study, researchers stated that the cluster heads in the active area can be overloaded compared to the cluster heads in the less active areas. This problem has not been fully taken into account in the current studies. In the study, load balancing and secure authentication were combined. The nodes’ real-time load and energy values were considered for load balancing. The study gave better results than S-LEACH, MS-LEACH, and SS-LEACH in terms of packet transmission rate, network lifetime, and compute load criteria.
In the study [
38], redundant data were eliminated with the k-means clustering algorithm for efficient data clustering in a WSN. In the study, it was stated that a network will perform more efficient data clustering by eliminating redundant data. The study proved that meaningless data can be eliminated, and data can be put together intelligently. The study was compared with the EK-means algorithm according to speed and energy criteria and gave good results.
The study [
39] proposed a secure data clustering approach using the autoregressive integrated moving average model, a time series technique. It was stated that data integrity protection was not taken into account in the study that ensures data confidentiality. The study was tested considering accuracy, computational cost, and communication cost criteria. The ESDA, TAG, CPDA, and RPDA were compared and gave better results.
The study [
40] proposed a secure data clustering model that provides query-based privacy protection in a WSN. As a result, computational complexity was reduced, and data confidentiality was ensured with homomorphic encryption by combining multiple queries in a single package. The work was tested in the MATLAB environment and proven to reduce energy consumption and protect data privacy.
The study [
41] proposed a secure data clustering approach that detects selective forwarding based on the Noise-Based Density Peaks Clustering (NB-DPC) algorithm. The study was tested in the MATLAB environment and it was observed that nodes exhibiting abnormal behavior were detected.
In the study [
42], an energy-efficient and privacy-protected secure data clustering algorithm was proposed for the WSN that consisted of three stages. The tree topology was created in the first stage, and the leaf nodes were organized. In the second step, the data collected by the leaf nodes were sliced, and the data pieces were sent to the neighboring nodes. In the last stage, data collection was carried out. The study was tested in the MATLAB environment and compared with SMART and PECDA considering communication load, energy consumption, privacy protection, and accuracy criteria, and it gave better results. It was emphasized that future work related to reducing redundant and useless data in data clustering is needed.
In the study [
43], an integrated trust-based energy-efficient data clustering approach was proposed for a WSN. Neighbors with less communication overhead were defined for each node. The route path was determined according to the confidence value gained. The Greedy Congestion Sensitive Data Collection model was used to increase the packet delivery rate. The study was tested in an Ns2 environment and compared with HRM and BTEM considering communication overhead, energy consumption, and packet delivery rate criteria, and better results were obtained.
In the study [
44], a new confidence function-based approach was proposed for cluster head selection in a WSN. The Threshold value was calculated for cluster head selection. The remaining energy of the node and the distance from the base station was used when calculating the threshold value. In this way, random cluster head selection was presented in the study. The study was tested in the MATLAB environment and compared with the Stable Election Protocol, considering the criteria of network lifetime, stability time, and node survival rates, and gave better results.
The study [
45] proposed a secure clustering approach using Integer Matrix Keys in a WSN. It was to ensure data confidentiality and data integrity by placing the digital signature on the collected data and using integer matrices as keys.
The study [
46] proposed an efficient clustering approach that ensured data integrity, wherein a new key management scheme was proposed, and data integrity was ensured and verified by neighboring nodes in the environment without needing a base station. The study was tested in the TinyOS environment, compared with CMT, SAWN, SecureDAV, SDAP, and SHAN, and gave better results, considering the criteria of energy consumption and mean time to fake data detection.
The study [
47] proposed a secure clustering protocol using the A-star heuristics algorithm and one-time pad (OTP) encryption scheme. The study was tested in the NS2 environment and showed success in energy consumption, network lifetime, end-to-end latency, and packet drop rate criteria.
As can be seen from these related studies, a comprehensive, secure data clustering protocol focusing on data availability has not yet been developed. In fact, in a recent review [
48], this situation was mentioned, and it was stated that availability from the Confidentiality, Integrity, and Availability (CIA) was an inevitable research topic. In the study [
48], current research was illustrated in Figure 16 [
48], and it was said that availability was neglected quite a lot.
In this study, a new secure data clustering protocol focused on data availability was developed.
4. Proposed Protocol
In this section, the network model, communication model, and proposed protocol are introduced.
4.1. Network Model
A WSN having an area a × a square units and N number of Sensor Nodes are deployed randomly. The base station is located at the head of the WSN and has unrestricted computational ability, storage, and battery power. Further, all the Sensor Nodes have similar storage, transceiver, and battery power. It is assumed that base station knows the location of all Sensor Nodes, which can be obtained from localization techniques or received signal strength indicator value. Communication is done using the CSMA\CA technique to avoid any packet collisions. There are four types of nodes: the base station with unlimited energy, cluster head node, cluster head assistant nodes, and end nodes. End nodes are responsible for continuously sensing data and transmitting results to cluster nodes. Cluster nodes, in turn, are responsible for aggregating data and forwarding data to the base station. On the other hand, Cluster Head Assistant can be used to be the head of the cluster when necessary and to provide coordination between the base station and the cluster when necessary. Thanks to the cluster head assistant, the nodes’ energy levels and load balances are protected.
It is known that a typical network in a WSN consists of a random distribution of sensor nodes. Even though a WSN is now used in most areas, one of the original aims of the WSN was to be able to establish networks in places that could not be networked, for example, on borders, near valleys, in impenetrable woodlands, etc. The WSNs were formed by scattering thousands of pre-programmed sensor nodes, such as via a helicopter, from a high place. Consequently, in some areas, the sensor nodes were dense, while in others, they could be sparsely distributed. Furthermore, in real nature applications, the base station’s location is outside, not in the middle of the area where the sensor nodes are located. The study was carried out considering these criteria.
The base station manages the operation of the network, which is important for security. Otherwise, a compromised cluster head can threaten more networks than a compromised node. It should be considered that when control nodes are created within the WSN, they can also be captured. Therefore, the control must be done by the base station. Routing tables are created for each node by the base station in the network, clusters are created and followed. Tables are updated at specified time intervals. The base station provides the coordination of the network together with cluster heads and cluster head assistants. When necessary, the base station can also communicate directly with the end nodes.
4.2. Cluster Selection Process/Setup Process
In this study, first, the nodes were randomly distributed to the environment, then, clusters were formed by dividing the available area into
n equal parts. The first view of the network is given in
Figure 9.
For example, there are 15 split areas in
Figure 10. As can be seen, the number of nodes in each area differs from that in real environments. For example, there is one node in the 1st area, there are eight nodes in the 2nd area, 14 in the 3rd area, 11 in the 4th area, and six in the 5th area.
Table 1 gives each cluster’s nodes, number, and common cluster nodes. When determining the cluster nodes in each area, first, it starts with the nodes close to the center point, then it continues outward. It is known that when writing the cluster nodes from the inside to the outside, the ones written first are the cluster head candidates, and the remaining nodes are out of the field. If the node carries over to both domains, it is numerically added to the number of clusters in each domain and is also shown in the common nodes section.
The number of clusters that should be in each cluster in the network is calculated with the help of Equation (1).
When the total number of Cluster Nodes is divided by 126 and the total area by 13, the result is 9.6. That is, the number of cluster nodes for each domain should be approximately nine. The area with the base station (6) and the area with no nodes (11) are not taken into account in the Total Area part. In a network where random nodes are distributed, the network may not work efficiently if there is an obvious imbalance between clusters, for example, having 15 nodes in one cluster and one node in another makes the network unstable.
As the next operation, the actual clusters of common nodes must be determined so that the Cluster Node Number can be nine. The actual clusters of common nodes are determined according to the flow given in
Figure 10. The flow is as follows: Starting from the first cluster (i), the common node list (j) is looked at sequentially. The first common node is taken, in turn, compared to all the common nodes in other fields. Next, the partner node is added to the field; whichever cluster it is in has the lower NCN number and is deleted from the other clusters. If the NCN is less than or equal to the mean, the common node is left in this field and deleted from the others. The NCN count is updated after the relevant node is deleted from other domains. The NCN count is reduced by one from the cluster whose partner node is deleted. The flow continues until the actual clusters of all common nodes are determined. After the operations, the few remaining nodes in the cluster are included in the nearest cluster.
After the operations, the final state of the network is given in
Figure 11 and the final state of the nodes is given in
Table 2.
In this way, clusters are determined in the network. As a result, energy and load balancing in the network are positively affected since the number of nodes of all clusters is brought close to each other.
4.3. EAX
The authors who developed OCB analyzed CCM in five different categories, namely efficiency, parameterization, complexity, variable-tag-length subtleties, and some wrong security claims, and revealed many disadvantages [
49]. As a result, the authors have developed a new mode of operation called EAX, which preserves the main features of CCM and eliminates the disadvantages they have stated. The EAX mode [
50] was submitted on 3 October 2003, to the attention of NIST in order to replace CCM as the standard AEAD mode of operation as the CCM mode lacks some desirable attributes of EAX and is more complex. The representation of EAX is given in
Figure 12.
The EAX mode (encrypt-then-authenticate-then-translate) is a mode of operation for cryptographic block ciphers. It is an Authenticated Encryption with Associated Data (AEAD) algorithm designed to simultaneously provide authentication and privacy of the message (authenticated encryption) with a two-pass scheme, one pass for achieving privacy and one for authenticity for each block. In the studies, different transaction modes were compared, and it was seen that the EAX mode could be a good choice for WSNs [
51,
52,
53], therefore, it was chosen in this study.
4.4. Blowfish
Blowfish [
54] is an encryption technique designed by Bruce Schneier in 1993 as an alternative to the DES Encryption Technique. It is significantly faster than DES and provides a reasonable encryption rate with no effective cryptanalysis technique found to date. The Blowfish algorithm is executed in three steps, generation of subkeys, initialization substitution boxes, and encryption.
Step 1: 18 subkeys {P [0]…P [17]} are needed in both encryption as well as decryption process, and the same subkeys are used for both processes. These 18 subkeys are stored in a P-array, with each array element being a 32-bit entry.
Step 2: 4 Substitution boxes (S-boxes) are needed {S [0]…S [4]} in both encryption as well as decryption process, with each S-box having 256 entries {S[i][0]…S[i][255], 0&lei&le4} where each entry is 32-bit.
Step 3: The encryption function consists of two parts: Rounds: The encryption consists of 16 rounds with each round (Ri) taking inputs from the plainText (P.T.) from the previous round and the corresponding subkey (Pi). Post-processing: The output after the 16 rounds is processed as follows: Every round r consists of four actions: Action 1—XOR the left half (L) of the data with the r th P-array entry. Action 2—Use the XORed data as input for Blowfish’s F-function. Action 3—XOR the F-function’s output with the right half (R) of the data. Action 4—Swap L and R. In the studies, different encryption algorithms were compared, and it was seen that the Blowfish algorithm could be a good choice for WSNs [
55,
56,
57], therefore, it was chosen in this study.
The Blowfish + EAX part of the SDA-RDOS was developed based on a new integrated approach. It mainly utilizes the idea that the Blowfish encryption algorithm is restructured using EAX. The Blowfish + EAX portion of the schema of SDA-RDOS is depicted in
Figure 13.
The Blowfish + EAX part of SDA-RDOS consists of eight steps. These are listed below:
First Step: The Entered message is divided into blocks of 64 bits (M [
1], M [
2], ……… M [m]). If the last message block is not 64-bit, then the remaining bits are filled with 0’s.
Second Step: NONCE value is a random value. An OMAC algorithm encrypts the generated NONCE value. (N’)
Third Step: N’ value is encrypted with CTR. This result is processed the same as the first message block. This process continues sequentially until each message is blocked. Then, the message block is xored with the result obtained from CTR operations as many as the number of blocks.
Fourth Step: The result obtained is encrypted with the Blowfish algorithm. The encrypted message block is obtained.
Fifth Step: The encrypted message is encrypted with the OMAC algorithm. (C’)
Sixth Step: The H value is encrypted with OMAC. (H’)
Seventh Step: N’ value, H’ value, and C’ values are Xored. The selected t bit is used as the Tag. The generated tag value is then used for data authentication.
The last step consists of the encryption of all the values and the production of the encrypted script with the tag value.
4.5. RSA—Partial Homomorphic Encryption
Data clustering is a structure that enables the sensors in the network to save energy by reducing the amount of data. To avoid security problems, the party sending the data should encrypt it and send it to the base station. However, the sensors on the path of the data must make the data clear text and perform the data set operation. Security and data clustering work in opposition to each other [
58]. The party sending the data to the base station sends the data with symmetric key encryption. The sensors on the network decrypt the data, cluster the data, and then send the data by performing the encryption process again. Data loses its confidentiality during these processes [
58]. Homomorphic Encryption structure can be used both to ensure data confidentiality and to perform data clustering. Thanks to the Homomorphic Encryption structure, the party sending the data to the base station encrypts the data. The sensors on the network perform the data aggregation process and send the data without performing the decryption process. As the encrypted data are not opened along the way, data confidentiality is ensured, and data clustering is performed [
58].
Homomorphic Encryption [
59,
60] is a structure that allows performing different types of transactions in countless numbers without being limited to a single operation type on encrypted data. The Homomorphic Cipher structure can basically be thought of as a ring homomorphism. For example, let
A be plaintext space,
B ciphertext space, Enc() encryption algorithm, and Dec() decryption algorithm. Accordingly, the encryption process can be considered a function defined from the
A ring to the
B ring [
61].
Enc():A → C
m1, m2 ∈ A;
It can be shown as a Homomorphic Encryption structure.
Full-Homomorphic Encryption supports addition and multiplication operations simultaneously and an unlimited number of times. If we add or multiply the contents of two or more ciphertexts, when we decrypt the result, the plaintext we get should be the same as if we had performed operations on the unencrypted information. The partially Homomorphic Encryption method was preferred in this study as the communication costs required by fully homomorphic encryption methods are very high compared to the WSN. Furthermore, partial HEs are preferred because they are faster than Fully HEs and their ciphertext size is smaller. Partial homomorphic encryption is the most important type of homomorphic technique; it performs the computation on some of the mathematical operations and has high efficiency for practical applications. Most PHE schemes support one type of operation.
The homomorphic property of RSA [
62,
63,
64,
65,
66], which was preferred as Partial HE for this study, was introduced later by Rivest, Adleman, and Dertouzous using the term “privacy homomorphism” [
67], which was an early example of PHE. The RSA scheme involves four algorithms as follows:
Keygen Algorithm: The public key is two integers (n, e), where n = pq and p, q are large primes and e chosen such that gcd (e, ƍ(n)) = 1, where ƍ(n) = (p − 1)(q − 1) and namely e is invertible (mod ƍ(n)). The secret key is (d, n), where d is determined such that d is the inverse of e (i.e., ed = 1 (mod ƍ(n))).
Encryption Algorithm: First, the message is converted into a plaintext m E Zn, then computes the ciphertext c as follows:
where the ciphertext c ∈ Zn.
Decryption Algorithm: Takes the secret key (d, n) with ciphertext c to decrypt
because d is the multiplicative inverse of e in Zn, then ed = 1 (mod ƍ(n)).
Homomorphic Property: For m1, m2 ∈ Zn,
As we can see, the homomorphic multiplication property of RSA can evaluate E(m1 × m2) directly from E(m1) and E(m2) without decrypting them. RSA was chosen because it is more efficient than other PHEs [
68,
69,
70].
4.6. DOS Mode
The DOS unit consists of two parts, the detection and defense unit.
Figure 14 shows the representation of the DOS unit.
Detection Unit: If something is wrong with the network, it is the unit that works to reveal it. If a DOS attack occurs during the operation of the network, the following situations may occur. These; Packet Drop Rate increases, Packet Transmission Rate decreases, Energy Consumption increases, and Latency increases.
As explained earlier, this information for each cluster is monitored by the base station. If a result above the threshold values occurs, an attack is detected. At the same time, clusters and nodes are monitored by the base station. If there is a difference between the values from the nodes or cluster heads and the base station, this is a problem. Central management is based on the base station because the base station energy is considered unlimited. Average values from all cluster heads in the environment are checked. If there is a cluster head with a difference in mean values, data are requested from all nodes in this cluster for proof testing. It is checked whether there is an attack by processing the incoming data.
Defense Unit: For defense, Packet Leash, Authorization, Authentication, Monitoring, Data accuracy, and Sleeping methods [
71,
72,
73] are used.
Packet Leash: It is determined by how many steps each node will send the packet. The base station initially determines it. If it exceeds this number of stages, the packet is dropped. In this way, a limit is placed on the circulation of the packet in the network.
Authorization: In the base station, all clusters and cluster heads, cluster head assistants, and cluster nodes are all registered. All nodes are centrally managed, authorized, and authorized by the base station. It can warn clusters based on base station calculations. Each of the nodes in the network has an id and is authenticated by the cluster leaders and their neighbors, namely the cluster head helpers. No packets are received from any node whose id does not match the network. Threshold values for packet transmission/drop rate, energy consumption, and delay parameters are determined, and nodes above the threshold values are removed from the communication for a certain period.
Authentication: During the network setup, there are embedded codes in each node that cannot be seen even by those who take over the node. The verification unit uses these codes. If there is compatibility in the codes obtained after a process with the sent code, there is no problem with the nodes. The authentication mechanism between the base station and the nodes is as follows: The base station requests a message from the node it wants to authenticate. Node A encrypts the message (ID etc.) with its private key followed by the symmetric key and sends it to the base station. Base station decrypts the incoming message with its private key and authenticates A, then decrypts the remainder with A’s public key. If successful, node A is validated. These transactions can also occur between two nodes or the cluster head and cluster member nodes. The identity of each node in the cluster with the problem is authenticated. The problematic node is removed from the routing lists, and new notifications are re-transmitted to the nodes.
Monitoring: In the cluster with the problem, first of al, cluster head replacement is made, considering that the cluster head is exposed to the attack. Member nodes transmit their data per new cluster. The new cluster head also transmits it to the base station. The previous cluster head may have been attacked if there is no problem with the threshold values or if it is not used for a certain time. If the problem persists, data are received sequentially and sent to the base station, with only one member node left out. According to the incoming values, the node which has a problem is removed from the routing lists or not used for a certain period.
Data accuracy: The nodes in the cluster transmit the data they detect to the cluster heads. Cluster heads also transmit these data to the base station by summing/averaging. At the specified time intervals, the base station requests data separately from the relevant cluster head and the related cluster member nodes to prove the data accuracy. It guarantees data accuracy by comparing incoming data. In this way, wrong decisions are avoided in the network.
Going to sleep: In some DOS attacks, continuous messages are sent only to keep the node busy instead of hijacking data and sending false information to the network. The aim is that the node cannot transmit the data it needs to send and reduces the energy of the node. Considering the enemy node has limited energy, our node can be put to sleep as a defense mechanism.
4.7. Communication
In clustered networks, the attack or hijacking of the cluster head affects all cluster nodes. Therefore, the importance of cluster heads further increases. In this study, cluster head assistants were created to prevent this situation, balance the energy-efficiency status of the cluster heads, and obtain information about the cluster. In the list created regarding
Table 2, the first nodes are assigned as cluster heads, the following two nodes as cluster head assistants, and the remaining nodes as cluster member nodes. Information about the 2nd, 7th, and 10th clusters is given in
Table 3.
After the network is established, the base station creates the routing table for each node, then, this information is transmitted to the relevant nodes. In this way, the nodes in the network transmit the data they detect only to the specified target nodes. For example, the routing table of cluster 2 is given in
Table 4.
When the base station receives data from all cluster heads, it simultaneously checks them. The number of packets from each cluster records the values temporally. It monitors the energy levels of the nodes, taking into account the number of incoming packets and the communication density. The base station requests data from the cluster nodes at specified intervals. The routing information can only be changed by the base station.
As seen in
Figure 15, the communication of base station with cluster heads or cluster head helpers is again with Blowfish + EAX. The blowfish + EAX model was applied here in a new way to ensure data confidentiality-integrity in the setup-initial communication between the base station and the nodes. Confidentiality and security must also be ensured in the process of determining the subsequent behavior of the network. It is an essential process for the network. The RSA Homomorphic method was used in data clustering, being preferred as there is no need for decrypt-encrypt operations again during the clustering phase. During the data collection phase, member nodes perform detection operations. The detected values are averaged over the cluster head nodes and transmitted to the base station. Thanks to the central control carried out by the base station, the security and coordination of the network were increased to the highest level.