A Research Approach to Port Information Security Link Prediction Based on HWA Algorithm

Xia, Zhixin; Zheng, Zhangqi; Bai, Lexin; Yang, Xiaolei; Liu, Yongshan

doi:10.3390/app142210646

Open AccessArticle

A Research Approach to Port Information Security Link Prediction Based on HWA Algorithm

by

Zhixin Xia

^1,2,

Zhangqi Zheng

³,

Lexin Bai

¹,

Xiaolei Yang

¹

and

Yongshan Liu

^1,*

¹

School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China

²

Hebei Port Group Co., Ltd., Tangshan 063200, China

³

School of Mathematics and Information Technology, Hebei Normal University of Science & Technology, Qinhuangdao 066004, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(22), 10646; https://doi.org/10.3390/app142210646

Submission received: 23 September 2024 / Revised: 28 October 2024 / Accepted: 14 November 2024 / Published: 18 November 2024

Download

Browse Figures

Versions Notes

Abstract

:

For the protection of information security, link prediction, as a basic problem of network science, has important application significance. However, most of the existing link prediction algorithms rely on the node information of the graph structure, which is not applicable in some graph structure data involving privacy. At the same time, most of the algorithms only consider the general graph structure and do not fully consider the high-order information in the graph. Because of this, this paper proposes an algorithm called hypergraph-based link prediction with self-attention (HWA) to solve the above problems. The algorithm can obtain hypergraphs without knowing the attribute information of hypergraph nodes and combines the graph convolutional network (GCN) framework to capture node feature information for link prediction. Experiments show that the HWA algorithm proposed in this paper, combined with the GCN framework, shows better link prediction performance than other graph-based neural network benchmark algorithms on eight real networks. This further verifies the validity and reliability of the model in this paper and provides new protection ideas and technical means for information security.

Keywords:

HWA algorithm; hypergraph; information security; residual connection

1. Introduction

With the continuous evolution of network attack methods, the importance of information security is becoming increasingly prominent [1,2]. As an important part of the national economy, the information security of ports is crucial. With the development of port intelligence, data link security becomes particularly important. Currently, ports are facing increasingly sophisticated cyber-attacks that may pose a serious threat to port operations and national security [3,4,5,6]. For example, ports in various countries have suffered from frequent hacking attacks in the past few years, which have caused severe economic losses and operational disruptions. To cope with this challenge, ports need to adopt effective cyber-security protection measures. However, traditional cyber-defense techniques are lacking, and port systems are facing stronger cyber-attack threats. In the field of network security, vulnerability detection technology plays a crucial role. However, relying solely on these technologies is not sufficient to fully guarantee information security. Therefore, it is necessary to conduct a comprehensive security assessment of the port network system and strengthen the security monitoring and detection capabilities of data links. At the same time, it is necessary to enhance the identification and response capabilities of network attacks in order to promptly respond to potential network threats and ensure the stability and security of port network systems. This multi-level protection strategy will more effectively maintain the overall security situation of the port network.

Complex networks exhibit complex structural relationships between nodes and links, effectively representing a large amount of unstructured data in real life. Link prediction, as a key issue in complex network research, utilizes observable network structures to explore the basic principles of network formation, predict possible future links, or identify missing links in the network. This not only helps to understand the evolution and stability of networks deeply but also provides important ideas for solving practical problems in reality. Port systems can be described as complex networks [7,8,9], where nodes represent entities and interactions between entities can be described as edges or links. Link prediction in networks refers to how to predict the possibility of generating a connection between two nodes in a network that have not yet generated a connected edge by using known information about the network structure and other information, which helps to excavate the network structure and evolution. With the wide application of complex networks in modern society, link prediction has become an important network analysis technique with far-reaching practical applications. Intending to make the link prediction model better adapted to different types of complex networks and improve the prediction ability of potential connections in the network, the study of link prediction with the help of graph neural networks combined with higher-order topology information provides a more in-depth and effective method for understanding and applying complex networks.

Ensuring accurate acquisition of node information in link prediction is a challenging task, especially when relying solely on the structural information of the network. Taking the port system network applied in this article as an example, considering the possible privacy protection measures, direct access to certain key information is restricted, which increases the complexity and challenge of information acquisition. Traditional hypergraph models often overly rely on node attribute information and overlook the importance of network topology. Therefore, it is urgent to further explore how to effectively integrate network structures and attribute information to improve the accuracy and robustness of link prediction models.

Although the use of complex networks in link prediction can better apply technology to real life, and although the study of high-order topologies can improve the accuracy and robustness of link prediction, there are still some problems: 1. In some current practical applications, the relationship between nodes can be extremely complex, and the edges of the general graph can only be connected to two nodes. It is difficult to accurately represent many-to-many, many-to-one and other complex relationships; thus, the general graph can not meet the needs; 2. Traditional feature extraction methods only focus on the local neighborhood information of nodes, and they cannot effectively capture the information in the global graph structure, and for large-scale graphs, the traditional feature extraction methods increase the computational and storage costs and reduce the generalization ability of the model intending to extract high-dimensional feature vectors; 3. In some cases, link prediction must rely on network structure information. Therefore, it is important to obtain accurate node attributes. However, in a real network, the information of many users is private, and it is difficult to obtain node information.

Given the above problems, the purpose of the method in this study is to extract and apply the high-order information existing in the graph structure, and this paper starts from the initial input data of the experiment. Firstly, the HWA algorithm is proposed to construct the hypergraph. This involves constructing a hypergraph through a general graph with basic information about the network structure. Secondly, based on the GCN model proposed in this paper, the feature vectors of nodes are extracted according to the input hypergraph correlation matrix. At the same time, residuals are added to solve the problem of gradient disappearance due to the increase in the graph data. Finally, MLP is used for link prediction, and the model is evaluated by AUC and F1 values to comprehensively evaluate the performance and effects of the model and to verify the effectiveness and reliability of the proposed method.

The sections of this paper are organized as follows. Section 2 introduces an overview of the foundational ideas that support this article. Section 3 reviews in detail the related work on graph convolutional neural networks and multilayer perceptrons. Section 4 describes the HWA data modeling method and the node feature extraction method proposed in this paper. Section 5 focuses on the experimental steps and experimental results. The conclusions of this paper are presented in Section 6.

2. Related Work

In response to increasingly complex social and information networks, researchers have proposed numerous link prediction methods in order to comprehensively understand and utilize these intricate networks. Qiao Lilin et al. [10] proposed a key node recognition algorithm, KNIA-MWF, based on multi-attribute weighted fusion. This algorithm combines the objective entropy weight method (EWM) and subjective analytic hierarchy process (AHP) to quantitatively analyze the impact of different attributes on node importance and to comprehensively evaluate the structural importance of nodes in the network through the final topological importance (FTI). This provided important ideas for link prediction in subsequent research. Kuanyang Li et al. [11] proposed an integrated model-based link prediction algorithm (EMLP) based on the idea of the integrated model, combined with logistic regression and superposition techniques, to improve the stability and accuracy of link prediction, and they experimentally proved that the link prediction of this method has better stability and accuracy. Jing Guo [12] proposed a single-mechanism link prediction algorithm based on node mechanism similarity, a hybrid link prediction algorithm based on an information entropy improved principal component analysis model and a hybrid link prediction algorithm based on ensemble learning. The reliability and stability of the prediction accuracy of the proposed three algorithms in the network were obtained by experimental tests. Sai Munikoti et al. [13] proposed a framework based on extensible and universal graph neural networks. The model learns the critical scores of nodes/links on a small representative subset of nodes/links and predicts large graphs by small graphs. The critical scores of nodes/links in large graphs are used to prove the scalability and accuracy of the framework. Zhiwei Zhang et al. [14] proposed an edge-convolution-based graph neural network called EdgeConvNorm for link prediction. The Hadamard product of the final learned link representation is used to determine the link between two nodes via a binary classifier sigmoid. Intending to solve the over-smoothing problem of the model, a normalization strategy for the link representation is proposed. Yuxuan Xiu et al. [15] proposed an extended network self-representation (ENSR) model based on the absence of predefined assumptions about the mechanism of network link formation, which produces an optimization model of the optimal ENSR function by minimizing the Frobenius paradigm between the network adjacency matrix and its ENSR transform. Moreover, the ENSR model performs well in link prediction and can adaptively discover potential link formation mechanisms in complex networks. Yanan Wang [16] proposed a link prediction model called DME based on neighbor contribution and path propagation for the intra-layer link prediction of single-layer networks. Aiming at the inter-layer link prediction of a two-layer network, a link prediction model called SWLEV based on the weighted embedding of edges in the network layer is proposed. By combining edge betweenness and degree centrality, an edge-weighted model is proposed, and an appropriate embedding method is added. The network structure is captured by combining the first-order proximity of the edge strength and the second-order proximity of the number of common neighbors to reduce the dimension of the network. These approaches demonstrate rich exploration and innovation in link prediction for complex networks. From integrated models to graph neural networks, as well as models based on mechanisms without assumptions, all of them meet the experimental expectations. In intricate networks, the in-depth understanding of higher-order structural information is of great significance in solving practical problems.

With the wide application of complex networks, link prediction has become an important task. Graph neural networks have been adopted by many scholars for link prediction due to their advantages of effectively capturing topology information, fusing node attributes, end-to-end learning, applicability to networks of different sizes and effective modeling of contextual information. However, most of the current link prediction methods mainly focus on low-order networks, ignoring the expression of complete semantic information between multiple entities, and, thus, there is a need to identify high-order topological structure information to more adequately capture the topological nature and complex associations of relational data. Tianyu Cheng [17] proposed a higher-order link prediction algorithm based on higher-order structural information around nodes, which uses biased random wandering to abstract the higher-order structural information around nodes and maps the wandering sequences of the nodes into low-dimensional feature vectors using a representation learning approach to capture the higher-order structural features around nodes. To cope with the effect of network disconnection, an algorithm based on edge relation matrix decomposition is proposed to combine non-negative matrix decomposition with higher-order link prediction to capture the similarity between nodes and edges in order to synthesize the likelihood of generating higher-order structures between nodes and edges in the network. Tao Yi et al. [18] proposed an efficient primitive neighbor matrix learning algorithm for extracting higher-order structures in directed networks and constructing multiple primitive neighbor matrices, as well as to study a new encoder scheme consisting of three modules: node embedding, information fusion and link prediction. Ru Huang et al. [19] proposed a link prediction model based on the Weisfeiler–Lehman simplex neural network (WL-SNN), which represents higher-order topological information by simplex features and uses the WL algorithm to construct the local neighborhood to extract higher-order information, in which a third-order Laplace operator is set to capture the simplex features. In addition, a two-channel neural network structure consisting of a simplex neural network and graph convolutional neural network is constructed. Yifan Lu et al. [20] proposed a heterogeneous hypergraph neural network (NOH) for sensing neighborhood overlap, which learns higher-order semantics while maintaining the original topology. It adaptively combines a heterogeneous hypergraph variational self-encoder with a neighborhood-overlapping perceptual graph neural network to take into account the heterogeneity of the graphs and learn the structural information of the graphs and estimate the overlapping neighborhoods for link prediction. Wenqiang Liu et al. [21] proposed a knowledge graph embedding method called KANE, inspired by graph convolutional networks, to express high-order structural relationships between entities. Yuyuan Ren [22] designed a prediction method with strong universality by making full use of the edge connection principle of a heterogeneous, attribute and time-series supernetwork. Aiming at the agglomeration characteristics of different links in heterogeneous supernetworks, a supernetwork link prediction model based on a hypergraph clustering parser is proposed. Kang Fu et al. [23] proposed two methods for link prediction, H-CCNC and FH-CCNC. These two methods combine node centrality and high-order clustering coefficients to predict missing links in the network from both global and local perspectives, demonstrating the potential to significantly improve prediction accuracy on multiple real datasets. These proposed high-order link prediction algorithms fully consider the high-order structural information around the nodes in the network and use different methods to capture and utilize this information to improve the accuracy and robustness of link prediction.

3. Preliminaries

3.1. Hypergraph

In traditional graphs, each edge can only connect two vertices, which may not be sufficient when expressing complex relationships. Hypergraphs break through this limitation by allowing an edge (called a “hyperedge”) to connect multiple vertices [24]. Specifically, a hypergraph consists of a set of vertices and a set of hyperedges, where each hyperedge is a subset of the set of vertices. This enables hypergraphs to more naturally represent many-to-many relationships. For example, in social networks, a group of users can be viewed as a hyperedge that contains multiple users (vertices) with complex interactive relationships between them. By using hypergraphs, we can more intuitively describe these relationships, allowing for a deeper analysis of social networks. In this experiment, we consider port information security as a complex social network and use hypergraphs to construct its intricate structure. Hypergraphs are used to accurately represent the relationships between ports and the information transmission and interaction patterns within ports. With the help of hypergraphs, we can capture higher-order correlations between more ports, which helps to comprehensively understand the complexity of port information security. This method not only improves the accuracy of port information security modeling but also provides strong support for subsequent analysis and prediction. The difference between hypergraphs and traditional graphs is shown in Figure 1.

Hypergraph

H

is an ordered binary

H = (X; E = {(e_{i})}_{i \in I})

(

I

is a limited identifier set), where

X

is a non-empty set of vertices, called a vertex set;

E

is a non-empty subset cluster of

X

, whose elements are called hyperedges.

Therefore, if

P (X)

is the power set of

X

, then E is a subset of

P (X)

∕

\{ϕ\}

. In a hypergraph

H

, the size of the vertex set is called the order of the hypergraph, and the size of the edge set is called the size of the hypergraph.

3.2. Graph Convolutional Networks

The core of graph convolutional networks (GCNs) [25] is to extract spatial features of graph structures. Viewing the port information dissemination network as a graph structure helps to systematically analyze and understand the information flow and interaction relationships between ports. Through graph convolutional networks (GCNs), a series of operations can be effectively performed on this graph structure. A GCN is similar to traditional convolution operations, where graph convolution filters waves on the graph to extract feature representations of nodes. The difference is that GCNs do not rely on fixed-size local neighborhoods when considering neighbor information but focus on the structure of the entire graph. The information between nodes propagates through the edges of the graph. Through matrix multiplication and activation functions, each node can aggregate information from its neighboring nodes to update its own representation. A GCN is a multilayer graph convolutional neural network, where each layer only processes information from the first-order neighborhood. By stacking multiple convolutional layers, information transfer between higher-order neighborhoods can be achieved. This article proposes a GCN method that utilizes the topology of graph structures to comprehensively consider the information of each node and its neighboring nodes. It can capture complex high-order associations between ports, which helps to comprehensively understand the complexity of port information security and improve the ability to identify complex network structures.

Starting from the input layer, the forward propagation goes through the operation of the graph convolution layer followed by the operation of the softmax activation function to obtain the predicted classification probability distribution. The computation of the graph network is an iterative process that constantly considers the neighboring nodes and the features of node, and each iteration is a feature reorganization, i.e., the next layer of features for the previous layer of features of the graph convolution (1) [26], that is, the propagation between the layers:

X^{l + 1} = σ {\tilde{(D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} X^{l} W^{l})

(1)

X^{l + 1}

in Equation (1) is the feature matrix of the

l + 1

th layer, and

σ

is the activation function. When calculating the new node features, not only the information of the neighboring nodes but also the information of its nodes can be taken into account, so the representation on the adjacency matrix

A

is

\tilde{A} = A + I

, where

I

is the unit array. Similarly, for the degree matrix

D

, not only the number of neighboring nodes but also their own information should be taken into account. Therefore, the unit array

I

is added to the degree matrix

D

, the sum of each row of

\tilde{A}

,

\tilde{D} = D + I = \sum_{j} \tilde{A_{i j}}

and

{\tilde{A}}_{i j}

to represent the elements of the

i

rows and

j

columns in

\tilde{A}

. The matrix after matrix normalization is

\overset{\land}{A}

, as shown in Equation (2):

\overset{\land}{A} = {\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}}

(2)

The new reorganization characteristics at this point are shown in Equation (3):

X^{l + 1} = σ (\overset{\land}{A} X^{l} W^{l})

(3)

Then, the forward propagation equation for a two-layer GCN network is shown in Equation (4), where σ and softmax are the activation functions:

Z^{l + 1} = s o f t \max [σ (\overset{\land}{A} X^{l} W^{l})], l = 0, 1

(4)

3.3. Multilayer Perceptron

A multilayer perceptron (MLP) [27] is a basic feed-forward neural network model that consists of multiple fully connected hidden and output layers through multiple fully connected hidden and output layers. In the port information security network in this study, MLP detection is used to identify abnormal situations and take corresponding measures in a timely manner. For the hidden layer

l

, where

j

is the index of the neuron,

z_{j}^{l}

is the weighted input of the

j

th neuron, and

a_{j}^{l}

is the output after the activation function. The calculation of the hidden layer is shown in Equations (5) and (6):

z_{j}^{l} = \sum_{i = 1}^{n^{l - 1}} w_{i j}^{l} \cdot a_{i}^{l - 1} + b_{j}^{l}

(5)

a_{j}^{l} = f (z_{j}^{l})

(6)

In Equation (5),

n^{l - 1}

is the number of neurons in the previous layer,

w_{i j}^{l}

is the weight connecting the

i

neuron in the

l - 1

layer and the

j

neuron in the

l

layer and

b_{j}^{l}

is the bias of the

j

neuron in the

l

layer, and in Equation (6),

f (\cdot)

is the activation function. The output layer uses different activation functions according to the type of task, and the calculation equation of the output layer is similar to that of the hidden layer. The gradient of the loss function with respect to the model parameters is computed by the back-propagation algorithm, and then the parameters are updated using gradient descent or its variants to train the model to minimize the loss function. The parameter updates are shown in Equations (7) and (8):

w_{i j}^{l} = w_{i j}^{l} - α \frac{\partial L}{\partial w_{i j}^{l}}

(7)

b_{j}^{l} = b_{j}^{l} - α \frac{\partial L}{\partial b_{j}^{L}}

(8)

In Equations (7) and (8),

α

is the learning rate and

L

is the loss function.

Finally, the MLP evaluates the monitoring performance through the validation set and evaluates the generalization capability using the test set.

4. Research Methodology

In this paper, for the security problems of port information dissemination, a research method named port information security link prediction based on the HWA algorithm is proposed, and the overall flowchart of the method is shown in Figure 2. It is generally divided into three parts. Firstly, the data construction part involves constructing the basic relationships of the ports. By understanding the basic relationships between various entities in the ports, a general graph adjacency matrix is formed. Then, using the hypergraph-based self-attention forest prediction (HWA) algorithm proposed in this paper, the adjacency matrix of the general graph is constructed as a hypergraph association matrix. Next is the data preprocessing part, which uses the hypergraph adjacency correlation matrix as the input and the GCN as the model for training, continuously optimizing the loss function to obtain node feature vectors. Finally, there is the model evaluation section; taking node feature vectors as inputs, the MLP learns through node feature vectors to obtain edge scores for link prediction. Throughout the process, gradient descent and other optimization algorithms are used to update the model parameters to improve the predictive performance of the model. Finally, the model was evaluated using the area under the curve (AUC) and the F1 value as evaluation indicators.

4.1. Constructing the Data Model Based on the HWA Algorithm

In the research method of port information security link prediction based on the HWA algorithm, the first part is the data construction part, which is based on the adjacency matrix of the general graph to understand the relationship between nodes, and an algorithm based on HWA is proposed to construct the hypergraph association matrix. For some current real-world applications, the relationship between nodes can be extremely complex, and the edges of the general graph can only connect two nodes, and it is difficult to accurately represent complex relationships such as many-to-many, many-to-one, and so on; at this time, the general graph can not satisfy the demand, so the hypergraph is used to capture the complex relationship structure. For hypergraphs where node information is difficult to obtain, the first thing that should be carried out is to construct a hypergraph, and determining the number of hyperedges is a key issue in constructing a hypergraph, so this experiment represents the hypergraph association matrix by the adjacency matrix of the general graph. The target output of the algorithm defined in this experiment is the hypergraph association matrix

H

, and Figure 3 describes the HWA algorithm as a flow diagram.

Based on the existing network topology, this paper proposes the HWA algorithm to construct the hypergraph. The hypergraph constructed by this algorithm mainly reflects the higher-order attribute information, and in order to better capture the higher-order relationships in the hypergraph and improve the performance of the prediction, the self-attention mechanism is used. Through the self-attention mechanism, these relationships can be transformed into hyperedges in the hypergraph while preserving the complex relationships between nodes, thus better representing the connection patterns between nodes.

A major problem in the implementation of this algorithm is how to determine the total number of hyperedges. To solve this problem, this paper uses the node

e

as the center node of the hyperedges in the hypergraph

ξ

and then determines the non-center nodes in each hyperedge by the optimal coefficient matrix of the HWA algorithm

R^{*}

. Finally, the association matrix

H

of the hypergraph

ξ

is obtained by combining the variables of the adjacency matrix

A

of the general graph

G

.

The optimal coefficient matrix reflects the interactions between nodes in a general graph based on the local structure. The HWA algorithm to find the optimal coefficient matrix is shown in Equation (9), where

I

is the unit matrix,

A^{T}

is the transpose of the adjacency matrix of the general graph and

λ

is the penalty coefficient.

R^{*} = {(A^{T} A + λ I)}^{- 1} A^{T} A

(9)

The hypergraph association matrix equation shown in Equations (10) and (11) determines the higher-order nodes contained in each hyperedge based on a matrix

R^{*}

containing all possible interactions between the local structures of the network, where

ϕ : R \to \{0, 1\}

is a matrix mapping, controlled by a threshold

γ

.

\overset{\land}{H}

denotes the higher-order nodes contained in the hyperedge.

H

denotes the hypergraph association matrix, and

A

is the adjacency matrix of the general graph.

\overset{\land}{H} = ϕ (Z^{*}, γ)

(10)

H = {({\overset{\land}{H}}^{T} + A)}^{T}

(11)

By constructing the hypergraph through HWA algorithm, this paper obtains the association matrix of the hypergraph from the adjacency matrix of the general graph based on the network topology without the information of the hypergraph nodes. The algorithm utilizes the self-attention mechanism to demonstrate the degree of association between nodes during the formation of the hypergraph, thus better capturing the complex relationships between nodes. The method helps to subsequently capture the higher-order relationships in the hypergraph, enabling the hypergraph to more accurately express the feature information between the nodes and providing a reliable basis for further analysis and application.

4.2. Obtaining Node Feature Vectors Based on GCN

Traditional feature extraction methods only focus on the local neighborhood information of the nodes and cannot effectively capture the information in the global graph structure, and for large-scale graphs, traditional feature extraction methods increase the computational and storage costs intending to extract high-dimensional feature vectors, which reduces the generalization ability of the model. To avoid the above problems, GCNs are used to extract data features. The use of a GCN can fully utilize the provided graph structure information to obtain the feature representation of each node through inter-node information transfer, and residual connections are added to the GCN to solve the gradient propagation problem and maintain the integrity of information in deep networks.

In the data processing section, the hypergraph correlation matrix constructed in Section 4.1 is transformed into node features, and the GCN is used as the model for training during the transformation process. During the entire data processing process, the GCN first constructs the input hypergraph correlation matrix into a hypergraph structure; then, the node features are initialized. Next, the node features are multiplied with the hypergraph association matrix using the hypergraph convolution operation combined with a learnable hypergraph convolution kernel to capture the complex relationship between nodes and hyperedges and generate the updated node features. Subsequently, a nonlinear transformation is introduced via a nonlinear activation function to enhance the model representation. The updated node features are then aggregated to obtain a higher-level representation of the nodes. These steps are iterated several times to propagate the information throughout the network. Finally, the iterated node features are input to the output layer for link prediction task processing. The whole process completes the extraction of node features and task processing by optimizing the cross-entropy loss function and continuously adjusting the model parameters to minimize the difference between the predicted values and the real labels. Figure 4 describes the process of feature extraction by the GCN.

Adopting the GCN effectively utilizes the graph structure information to realize the information transfer between nodes and obtains the feature representation of each node through inter-node information transfer, which enables the model to better capture the local and global information of the graph data and improves its ability to characterize the node features and the performance of the model in terms of representing the node’s feature vectors well. Meanwhile, since graph data usually have a complex structure and a large number of nodes, the problem of gradient vanishing occurs in the deep network as the network depth increases. Therefore, in the above GCN process of obtaining node feature vectors, the residual connection is realized by adding the output of the first hidden layer with the input of the second layer so that the gradient can be directly propagated back to the earliest layer of the network through the residual connection, which, in turn, improves the training efficiency and performance of this model. The addition of residual connections also maintains the integrity of the input features and reduces the loss of information in the inter-layer transfer process.

The residual

\overset{\land}{ε}

, which is the gap between the model output and the actual output, is shown in Equation (12), where

y

is the output of the GCN model and

\overset{\land}{y}

is the desired output.

\overset{\land}{ε} = y - \overset{\land}{y}

(12)

In order to better capture the relationship between nodes and improve the representation ability of the model, a double-headed attention mechanism is added. It dynamically adjusts the importance between nodes by learning the attention coefficient and aggregates the nodes according to the characteristics of the nodes and the attention washing. The double-head attention mechanism introduces additional parameters and nonlinear transformation to improve the representation ability of the model and learn the attention weight. The model can automatically learn the importance and correlation of each node feature, thereby improving the ability of the model to model the different nodes of the soil structure in this paper.

The GCN is used to extract node feature information, fully utilize the graph structure to capture relationships and features between nodes, extract local features and preserve the global structure. This not only improves the flexibility of the model but also enhances the training efficiency and model performance. The hypergraph convolution kernel needs to maintain optimal learning performance throughout the entire iteration process. Therefore, in this paper, the node feature vectors are initialized in the data processing section, and residual connections are added in subsequent iterations to help effectively propagate node information between layers and alleviate the problem of gradient vanishing in the model. Due to the involvement of hypergraphs in this article, the high-order information contained in the hypergraph structure is processed between the layers of the model through activation functions. However, these activation functions may also face the problem of gradient vanishing during gradient propagation, so special attention needs to be paid to the impact of this phenomenon on the model’s performance. The saturation characteristics of the activation function and the complex high-order feature dependencies may lead to a gradual weakening of the gradient transmitted to each layer, thereby limiting the ability of effective learning. For the challenges encountered in this article, we chose the Relu activation function and added residual connections to alleviate the problem of gradient vanishing. The learnability of the hypergraph convolution kernel is effectively improved through the above two parts. Due to the involvement of hypergraphs in this article, the high-order information contained in the hypergraph structure is processed between the layers of the model through activation functions. However, these activation functions may also face the problem of gradient vanishing during gradient propagation, so special attention needs to be paid to the impact of this phenomenon on the model’s performance. The saturation characteristics of the activation function and the complex high-order feature dependencies may lead to a gradual weakening of the gradient transmitted to each layer, thereby limiting the ability of effective learning. For the challenges encountered in this article, we chose the Relu activation function and added residual connections to alleviate the problem of gradient vanishing. Residual connections not only help to address the problem of gradient vanishing, accelerate training and improve the model’s depth and expressiveness, but it can also help prevent overfitting and optimize the model’s performance. The addition of a dual-head attention mechanism improves the model’s ability to represent the graph structure applied in this paper. In summary, this model is suitable for processing complex graphical data, improving the model’s performance and generalization ability and providing a reliable basis for subsequent link prediction tasks and applications.

5. Experimentation

5.1. Experimental Environment

The data construction and data extraction of this experiment and the training and evaluation of the model were conducted in the environment of Windows 10, and the specific environment settings are shown in Table 1.

In order to evaluate the performance of the model proposed in this experiment for link prediction, the AUC and F1 metrics were used in this study, and the F1 metric considers both the precision and recall of the classification model, as shown in the specific evaluation metrics Equations (13)–(16).

The AUC and F1 value of binary classification were the evaluation indicators used in this experiment. The AUC can comprehensively measure classification algorithms, and when the AUC score exceeds 0.5, it can measure the performance of link prediction algorithms. In the comparison of

a

independent scores, when there are

a^{'}

times when the positive edge score is higher than the negative edge score, and

a^{″}

has the same score, the calculation equation for the AUC is defined as per Equation (13) [28]:

A U C = \frac{a^{'} + 0.5 a^{″}}{a}

(13)

Precision represents the ratio of the number of samples correctly predicted as positive cases to the number of samples of all true positive cases and is calculated by Equation (14) [29]:

P = \frac{T P}{T P + F P}

(14)

Recall represents the ratio of the number of samples correctly predicted as positive cases to the number of all samples predicted as positive cases and is calculated by Equation (15) [30]:

R = \frac{T P}{T P + F N}

(15)

The F1 value is used to measure the values of precision and recall and is the reconciled mean of these two values, where [31]:

F = \frac{(1 + a^{2}) P R}{a^{2} (P + R)}

(16)

The confusion matrix has four markings:

T P

(true proportion),

F P

(false proportion),

F N

(false negative) and

T N

(true counterexample), which denote true positives, false positives, false negatives and true negatives, respectively. Generally speaking, a higher value of these three evaluation indexes represents a better performance of the classification model.

5.2. Datasets

This section uses eight publicly available network datasets (doi: https://github.com/pinglanchu/NSLR-HMANN (accessed on 19 March 2024)). These datasets relate to biological, social, political and voting networks. The basic features and some static topological statistics of the eight real networks are shown in Table 2.

N

denotes the number of nodes, and

M

denotes the number of links.

ρ

is the network density, and

K_{c s g}

is the number of connected subgraphs.

5.3. Experimental Results

In order to verify the effectiveness of the combination of the HWA algorithm and the GCN model proposed in the previous section, link prediction was carried out in this study using the eight real network datasets mentioned in Section 5.2 as an example. Table 3 lists the prediction results of the prediction model proposed in this paper on the eight datasets. By analyzing the data in the following table in detail, it can be seen that the model proposed in this paper showed excellent performance on the eight real network datasets mentioned above. These results strongly prove the validity and reliability of the model proposed in this experiment.

In this experiment, three advanced GNN-based methods were selected as benchmarks. They are a variant of the graph neural network GIN [32], the hypergraph neural network framework HGNN [33,34] and the hypergraph neural network HyperGCN [34]. Table 4 and Table 5 show the results of the AUC and recall obtained by the model in this experiment compared with the other algorithms. Through comparison, it was found that the performance of the model in this experiment was almost the best in terms of link prediction.

Table 4 shows the AUC indicators of the model proposed in this paper and of the other baseline models under the different test sets. From the data in the table, it can be concluded that the experimental model proposed in this paper almost had the best performance in terms of link prediction. For example, in the Bio-CE-GT real network data, the AUC index of the model proposed in this paper was 7.49%, 1.01% and 1.82% higher than that of the three baseline models. However, it can be found from the figure that in the PB real network data, the HGNN model was not much different from the model proposed in this experiment. The main reason for this situation is that hypergraph neural networks are trained directly on hypergraphs, so the structural characteristics of the hypergraphs directly affect the high-order structural information extracted by the network. From this point of view, the model proposed in this experiment was not always the most effective, but the overall effect was the best, so it can be confirmed that the model proposed in this experiment is effective.

Table 5 shows the F1 indicators of the model proposed in this paper and of the other baseline models under the different test sets. From the data in the table, it can be concluded that the experimental model proposed in this paper almost had the best performance in terms of link prediction. For example, in the Facebook real network data, the F1 index of the model proposed in this paper was 14.81%, 7.03% and 11.93% higher than that of the three baseline models. From Table 4 and Table 5, it can be found that the performance of the model proposed in this paper was not stable in the different real networks, but overall, it was better than the other baseline models in terms of link prediction. Through in-depth analysis of the data shown in Table 4 and Table 5, it was found that the model proposed in this paper was not stable in the different actual networks. However, in general, the model proposed in this paper was superior to the other baseline models in terms of link prediction. The results of this study demonstrate the effectiveness of hypergraph neural networks in link prediction tasks, especially in mining high-order structural information. Compared with related works in the literature, such as the study by Li K. et al. (2020) [11], their proposed improved similarity-based complex network link prediction method performed well on certain datasets but fell short in handling complex high-order relationships. Our model successfully captured this high-order information by training directly on the hypergraph, thus achieving better performance in multiple testing scenarios. This observation suggests a deeper understanding of the performance of the model in this experiment. Although the performance was unstable in the different network environments, the evaluation of its overall performance in this experiment shows that the model in this experiment has obvious advantages in link prediction tasks.

6. Conclusions

6.1. Summary

The aim of this experiment was to apply high-order structural information to solve link prediction problems in real-world networks with unknown node attribute information. Traditional link prediction models mainly rely on accurate node attribute information and are not suitable for user information with privacy. Therefore, this paper proposes the HWA algorithm, which constructs a hypergraph through a general graph with known network basic structural information to solve the problem of node attribute information having privacy. The problem of capturing and utilizing high-order structural information embedded in the network was solved by constructing a hypergraph using the HWA algorithm; also, due to the modeling of hypergraphs in real networks, hyperedges contain both low-order and high-order structural information. The hypergraph learning method based on the improved GCN framework in this article utilizes a hypergraph correlation matrix to obtain node feature vectors, effectively overcoming the problem of gradient vanishing and better capturing node feature information. The experimental results show that our model exhibits excellent performance in link prediction tasks, providing an effective solution for solving prediction problems in complex networks.

6.2. Expectation

Although the large number of experiments conducted in this study show that the model proposed in this paper has superior performance, there are still some limitations. In this paper, only linear interactions between local structures were considered, and general features of complex networks, such as sparsity and redundancy of local structures, were not taken into account. This limitation may affect the performance of the model proposed in this experiment and lead to unstable performance in some real networks. In addition, due to the fact that hypergraph neural networks are trained directly on hypergraphs, the model proposed in this paper does not have an absolute competitive advantage. Therefore, future research directions should focus on optimizing feature engineering and adjusting the model’s architecture to further enhance the performance and adaptability of the model. This will help to more effectively utilize high-order information in hypergraph structures, thereby driving progress in related tasks such as link prediction. At the same time, more network features should be considered, and a better balance in terms of model design should be found to achieve more stable and efficient performance.

Author Contributions

Conceptualization, methodology, validation, investigation and data curation, Z.X., Z.Z., L.B. and X.Y.; software and visualization, Z.X. and Z.Z.; writing—original draft preparation and writing—review and editing, Y.L.; supervision, X.Y., L.B., Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 61972334).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in [Section 5.2].

Acknowledgments

Thank you to Xubo Zhao and Chengfei Ma for the help.

Conflicts of Interest

Author Zhixin Xia was employed by the company Hebei Port Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Shin, G.; Hong, S.; Lee, J. Network Security Node-Edge Scoring System Using Attack Graph Based on Vulnerability Correlation. Appl. Sci. 2022, 12, 6852. [Google Scholar] [CrossRef]
Rozenman, G.; Kundu, N.; Liu, R. The quantum internet: A synergy of quantum information technologies and 6G networks. IET Quantum Commun. 2023, 4, 147–166. [Google Scholar] [CrossRef]
Cui, D.; Sun, G.; Zhan, X. Security Risk Management System for the Construction and Operation of Smart Port Area Based on BP Neural Network Algorithm. Procedia Comput. Sci. 2023, 228, 838–846. [Google Scholar] [CrossRef]
Bruzzone, A.G.; Mairne, M. Promoting Safety, Security, Awareness and Productivity in Port Plants. Procedia Comput. Sci. 2024, 232, 358–367. [Google Scholar] [CrossRef]
Fernandez, E.B.; Romero, V.M. A security reference architecture for cargo ports. Internet Things Cyber-Phys. Syst. 2022, 2, 120–137. [Google Scholar] [CrossRef]
Gunes, B.; Kayisoglu, G.; Bolat, P. Cyber security risk assessment for seaports: A case study of a container port. Comput. Secur. 2021, 103, 102196. [Google Scholar] [CrossRef]
Wu, X.; Jing, Z.; Wang, X. The security of IOT from the perspective of the observability of complex networks. Heliyon 2024, 10, e27104. [Google Scholar] [CrossRef]
Zhou, M.; Liu, H.; Liao, H. Finding the key nodes to minimize the victims of the malicious information in complex network. Knowl.-Based Syst. 2024, 293, 111632. [Google Scholar] [CrossRef]
Wang, K.; Tu, Z.; Ji, Z.; He, S. Multi-stage data synchronization for public blockchain in complex network environment. Comput. Netw. 2023, 235, 109952. [Google Scholar] [CrossRef]
Qiao, L.; Wu, M.; Zhao, M. Identification of Key Nodes in Complex Networks. In Proceedings of the 2021 the 7th International Conference on Computer and Communications, Chengdu, China, 10–13 December 2021; pp. 2230–2234. [Google Scholar]
Li, K.; Tu, L.; Chai, L. Ensemble-model-based link prediction of complex networks. Comput. Netw. 2020, 166, 106978. [Google Scholar] [CrossRef]
Guo, J. Research on Link Prediction Algorithm in Complex Networks. Master’s Dissertation, Lanzhou Jiaotong University, Lanzhou, China, 2023. [Google Scholar]
Munikoti, S.; Das, L.; Natarajan, B. Scalable graph neural network-based framework for identifying critical nodes and links in complex networks. Neurocomputing 2022, 468, 211–221. [Google Scholar] [CrossRef]
Zhang, Z.; Cui, L.; Wu, J. Exploring an edge convolution and normalization based approach for link prediction in complex networks. J. Netw. Comput. Appl. 2021, 189, 103113. [Google Scholar] [CrossRef]
Xiu, Y.; Liu, X. An extended self-representation model of complex networks for link prediction. Inf. Sci. 2024, 662, 120254. [Google Scholar] [CrossRef]
Wang, Y. Research on Key Algorithms of Link Prediction Based on Complex Network Structure. Master’s Dissertation, School of Information and Communication Engineering, Beijing, China, 2023. [Google Scholar]
Cheng, T. Research on Higher-Order Link Prediction Methods Based on Network Representation Learning. Master’s Dissertation, Lanzhou University of Technology, Lanzhou, China, 2024. [Google Scholar]
Yi, T.; Zhang, S. Link prediction based on higher-order structure extraction and autoencoder learning in directed networks. Knowl.-Based Syst. 2022, 241, 108241. [Google Scholar] [CrossRef]
Ru, H.; Feng, M. Exploring network reliability by predicting link status based on simplex neural network. Displays 2023, 79, 102457. [Google Scholar]
Lu, Y.; Gao, M. Neighborhood overlap-aware heterogeneous hypergraph neural network for link prediction. Pattern Recognit. 2023, 144, 109818. [Google Scholar] [CrossRef]
Liu, W.; Cai, H.; Cheng, X. Learning high-order structural and attribute information by knowledge graph attention networks for enhancing knowledge graph embedding. Knowl.-Based Syst. 2022, 250, 109002. [Google Scholar] [CrossRef]
Ren, Y. Research on Hypernetwork Link Prediction Method Based on Multidimensional Feature Mining. Master’s Dissertation, PLA Strategic Support Force Information Engineering University, Zhengzhou, China, 2024. [Google Scholar]
Fu, K.; Chang, W.; Yan, G.; Luo, H. Research on Link Prediction Algorithm Integrating High-order Information and Node Centrality in Network. In Proceedings of the 2023 8th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 21–23 April 2023; pp. 1391–1395. [Google Scholar]
Gao, Y.; Ji, S.; Han, X.; Dai, Q. Hypergraph Computation, Engineering. 2023. Available online: https://doi.org/10.1016/j.eng.2024.04.017 (accessed on 25 May 2024).
Fu, S.; Liu, W. Example-feature graph convolutional networks for semi-supervised classification. Neurocomputing 2021, 461, 63–76. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Xiang, M.; Zhou, B. MCMP-Net: MLP combining max pooling network for sEMG gesture recognition. Biomed. Signal Process. Control 2024, 90, 105846. [Google Scholar]
Chai, L.; Tu, L. Hypergraph modeling and hypergraph multi-view attention neural network for link prediction. Pattern Recognit. 2024, 149, 110292. [Google Scholar] [CrossRef]
Peretz, O.; Koren, M.; Koren, O. Naive Bayes classifier—An ensemble procedure for recall and precision enrichment. Eng. Appl. Artif. Intell. 2024, 136, 108972. [Google Scholar] [CrossRef]
Hinduja, S.; Nourivandi, T.; F.Cohn, J.; Canavan, S. Time to retire F1-binary score for action unit detection. Pattern Recognit. Lett. 2024, 182, 111–117. [Google Scholar] [CrossRef] [PubMed]
Xu, K.; Hu, W.; Leskovec, J. How Powerful Are Graph Neural Networks? ICLR: New Orleans, LA, USA, 2019. [Google Scholar]
Feng, Y.; You, H.; Zhang, Z. Hypergraph Neural Networks. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, HI, USA, 27 January 2019. [Google Scholar]
Gao, Y.; Feng, Y. HGNN(+): General Hypergraph Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 3181–3199. [Google Scholar] [CrossRef]
Chen, D.; Lin, Y.; Li, W. Measuring and Relieving the Over-Smoothing Problem for Graph Neural Networks from the Topological View. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, USA, 7 February 2020. [Google Scholar]

Figure 1. Schematic diagram of graph and hypergraph.

Figure 2. The overall flow diagram of the research method of port information security link prediction based on the HWA algorithm. The graph comprises three parts: (1) data construction, where the adjacency matrix is constructed from port relationships and transformed into a hypergraph adjacency matrix using the HWA algorithm; (2) data preprocessing, where the hypergraph adjacency matrix is input to a GCN model for training, generating optimized node feature vectors; and (3) model evaluation, where the MLP utilizes these vectors for edge scoring in link prediction, with the model performance evaluated using the AUC and F1 metrics.

Figure 3. The schematic diagram of the HWA algorithm. The adjacency matrix of the general graph is formed into a hypergraph adjacency matrix through the HWA algorithm, and a self-attention mechanism is added to better capture high-order relationships in the hypergraph and improve prediction performance.

Figure 4. Obtaining node feature vectors by the GCN. Firstly, the hypergraph structure is constructed based on the input hypergraph association matrix. Next, by combining hypergraph convolution with learnable hypergraph convolution kernels, complex node and hyperedge relationships are captured to generate updated node features. Model expression is enhanced by activating functions and aggregating updated node features. These steps are iterated multiple times. Finally, the iterative node feature input and output layers are used for link prediction.

Table 1. Experimental environment settings.

Hardware Environment	Software Environment
GPU NVIDIA GTX 3060 (NVIDIA, Santa Clara, CA, USA)	Windows 10
Memory 32 GB	Python 3.8.19
CUDA11.6.0	Torch 1.13.0
cuDNN 8.0.5	Anaconda 3

Table 2. Basic feature data representation of the dataset.

Datasets	N	M	$ρ$	$K_{c s g}$
Bio-CE-GT	924	3239	0.0076	13
Celegans	297	2148	0.0489	1
Ecoli	761	2276	0.0079	35
Facebook	786	14,024	0.0455	4
Jazz	198	2742	0.1406	1
Metabolic	453	2025	0.0198	1
PB	1224	16,715	0.0223	2
Soc-wiki-vote	889	2914	0.0074	1

Table 3. AUC and F1 values for link prediction by the model proposed in this experiment.

	Bio-CE-GT	Celegans	Ecoli	Facebook	Jazz	Metabolic	PB	Soc-wiki-vote
AUC	0.6830	0.6521	0.7006	0.7941	0.7793	0.7040	0.7842	0.6410
F1	0.6788	0.6515	0.6903	0.7940	0.7892	0.6978	0.7838	0.6300

Table 4. Results of AUC comparison between this experimental model and the four baseline models for link prediction.

Algorithm	Bio-CE-GT	Celegans	Ecoli	Facebook	Jazz	Metabolic	PB	Soc-wiki-vote
GIN	0.6081	0.5884	0.5970	0.6497	0.6639	0.6273	0.7455	0.5759
HGNN	0.6729	0.6419	0.6748	0.7273	0.7091	0.6974	0.7827	0.6386
HyperGCN	0.6648	0.6461	0.6708	0.6810	0.7439	0.6825	0.7376	0.6225
Ours	0.6830	0.6521	0.7006	0.7941	0.7793	0.7040	0.7842	0.6410

Table 5. Comparison results of F1 values for link prediction by this experimental model and four baseline models.

Algorithm	Bio-CE-GT	Celegans	Ecoli	Facebook	Jazz	Metabolic	PB	Soc-wiki-vote
GIN	0.6024	0.5856	0.5905	0.6459	0.6632	0.6215	0.7438	0.5713
HGNN	0.6597	0.6324	0.6614	0.7237	0.7053	0.6901	0.7815	0.6208
HyperGCN	0.6530	0.6404	0.6569	0.6747	0.7341	0.6735	0.7356	0.6072
Ours	0.6788	0.6515	0.6903	0.7940	0.7892	0.6978	0.7838	0.6300

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xia, Z.; Zheng, Z.; Bai, L.; Yang, X.; Liu, Y. A Research Approach to Port Information Security Link Prediction Based on HWA Algorithm. Appl. Sci. 2024, 14, 10646. https://doi.org/10.3390/app142210646

AMA Style

Xia Z, Zheng Z, Bai L, Yang X, Liu Y. A Research Approach to Port Information Security Link Prediction Based on HWA Algorithm. Applied Sciences. 2024; 14(22):10646. https://doi.org/10.3390/app142210646

Chicago/Turabian Style

Xia, Zhixin, Zhangqi Zheng, Lexin Bai, Xiaolei Yang, and Yongshan Liu. 2024. "A Research Approach to Port Information Security Link Prediction Based on HWA Algorithm" Applied Sciences 14, no. 22: 10646. https://doi.org/10.3390/app142210646

APA Style

Xia, Z., Zheng, Z., Bai, L., Yang, X., & Liu, Y. (2024). A Research Approach to Port Information Security Link Prediction Based on HWA Algorithm. Applied Sciences, 14(22), 10646. https://doi.org/10.3390/app142210646

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Research Approach to Port Information Security Link Prediction Based on HWA Algorithm

Abstract

1. Introduction

2. Related Work

3. Preliminaries

3.1. Hypergraph

3.2. Graph Convolutional Networks

3.3. Multilayer Perceptron

4. Research Methodology

4.1. Constructing the Data Model Based on the HWA Algorithm

4.2. Obtaining Node Feature Vectors Based on GCN

5. Experimentation

5.1. Experimental Environment

5.2. Datasets

5.3. Experimental Results

6. Conclusions

6.1. Summary

6.2. Expectation

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI