1. Introduction
In complexity science, a network (denoted by
in graph theory) is composed of a node set
consisting of
n nodes and an edge set
E consisting of
m edges between the nodes. Many real-world networks such as the Internet, WWW, large-scale power networks, transportation networks and interpersonal networks can be modeled in this concise way [
1]. Using this method, these networks can be regarded as a collection of nodes with independent characteristics interconnected with other individuals. Each individual is regarded as a node in the network, and the connection between nodes is regarded as the edge of the network. This abstract method can intuitively show the topology of the real network, and also provides an effective research method for understanding the state and the function of the real network [
2].
However, with the continuous development of technology and society, epidemic viruses [
3], computer viruses [
4], misinformation [
5], or corruption [
6] have more serious negative effects in the human world. However, removing or deactivating a part of the key nodes through the network dismantling method in the network to decompose the network into several isolated sub-parts can effectively protect the robustness of the network, control the dynamic behavior of the network, and curb the negative effects in the network mentioned above. Previous studies proved that this method to remove or deactivate the key nodes can effectively curb the spread of epidemics in the population [
7], prevent the spread of misinformation through social networks [
8] and prevent the spread of viruses in computer networks [
9]. Some studies on complex networks choose a set of node subsets
S in the network with an optimal method, and explore the influence of removing
S on the network characteristics. For example, exploring how the maximum connected (strong) subset of the network will change after removing
S, in the example of epidemics or network viruses transmission, if
S is isolated or infected first, the impact on the speed of virus transmission can be determined [
10]?
In the actual situation, it will produce a certain cost consumption
C when selecting and removing the node subset
S in the network. In the epidemic propagation model, the vaccination of the node requires a certain socioeconomic cost. Removing different node sets in the computer virus or public opinion propagation network consumes different resources in the actual situation. Therefore, a combinatorial optimization problem is generated, whereby under the influence of the constraint removal node subset
S in the network, the removal cost
C is minimized. Additionally, removing nodes will destroy the network structure and affect the function of the network, so it is necessary to remove as few nodes as possible and find a set of nodes with the lowest removal cost
C, that is, the minimum disassembly set (we consider the most common situations where the disassembly cost is the unit cost; the minimum disassembly set is the set with the least number of nodes) and remove it. After the network is decomposed into multiple sub-parts that are not connected, network disassembly is achieved [
11]. Finding the minimum disassembly set is an NP-hard problem [
12]. For this kind of problem, only effective approximation algorithms or heuristic algorithms can be found at present. For the disassembly problem, there have been some recursive algorithms based on the degree or centrality of the nodes. For example, in 2015, a heuristic algorithm called ‘collective’ influence (CI) [
13] was proposed, which determines the ownership of the nodes in an undirected random network according to the degree of nodes and the degree of local neighbor nodes; in 2016, Alfredo Braunstein proposed a three-segment minimum sum (hereafter referred to as the Min-Sum) method for dismantling large random undirected networks [
11]; for large undirected random networks, this method is a more effective dismantling algorithm. In 2019, Ren proposed a general network disassembly (hereafter referred to as the GND) method for undirected weight networks [
14]. In addition, some relatively new disassembly methods and analyses of disassembly [
15,
16] are provided, such as the disassembly algorithm based on the message passing model (2020) [
17] and the sensitive disassembly method of neighbor connection (2020) [
18].
Most of the existing disassembly algorithms are carried out in undirected networks, while there are few disassembly algorithms for directed networks. However, in many real-world networks (such as WWW networks, acquaintance relationships, network email networks, text association networks and article citation networks, etc.), the edges between nodes are unidirectionally connected, and there is no mutual relationship in undirected networks [
19]. The existing disassembly methods are sometimes not suitable for the disassembly of these directed networks. Compared with disassembly in the undirected network, disassembly in the directed network needs to consider the direction of the edges between the nodes in the network. For example, in a public opinion network, when an incoming node (Innode) connected by a directed edge
e (also known as an arc in graph theory) is a communicator, the outgoing node of this edge (Outnode) is not likely to be propagated by the node. When applying the undirected network-based disassembly method to disassemble the network, this one-way relationship between nodes is sometimes ignored, resulting in the removal of this edge
e, and causing unnecessary disassembly to affect the disassembly effect.
The influence of the internal mechanism of this network on the disassembly is ignored in the traditional disassembly method. To solve this problem, the non-backtracking matrix representing the edge adjacency relationship is applied as the operator of spectral division, whereby it retains the directionality of the node relationship, and combines the disassembly of the directed network with the spectral division method. An improved spectral disassembly method for directed networks (hereafter referred to as the DIR method) is proposed; the edges in the directed network are directly applied as the disassembly unit, and the strongest connected subgraph of the directed network is used as the disassembly subject. The spectral characteristics of the edge adjacency matrix (non-backtracking matrix) are applied to the bipartition of the edge modules in the maximum strongly connected subset. The overlapping node set of the node sets connecting the two edge modules is then found as the minimum disassembly set to disassemble the network until the disassembly scale reaches the specified disassembly scale of the network (the maximum number of nodes in the strongly connected subgraph). The excellent characteristics of the non-backtracking matrix can be made full use of by using the DIR method, and the DIR method can greatly protect the topology of the network during the disassembly process. Furthermore, it is verified by experiments that the DIR method is suitable for the disassembly of directed networks. Finally, the influence of different disassembly methods on the network structure is analyzed by analyzing the changes of network indexes such as the clustering coefficient, assortativity coefficient and modularity function in the disassembly process, and it is verified that the application of edge module partition to disassemble the network can greatly retain the structural information in the networks.
2. Related Works
As described in the previous section, many network dismantling methods have been proposed in recent years. Next, I will introduce two methods that are compared with this article, namely the GND algorithm and Min-Sum algorithm.
GND method. This method considers the case where the node removal cost is equal to the node weight and is not a unit cost. First, perform spectral division of the Laplacian matrix for which the operators are node weights of the network. After the division is successful, the node weight coverage algorithm is applied to the edges connecting different divisions to find the minimum weight point set that can divide the network, so as to find the minimum cost disassembly set. Compared with the previous algorithm, GND is more general and applicable. It considers the influence of node weights in undirected networks on the network disassembly problem. However, the operator in the GND method is a node adjacency matrix.
Min-Sum method. Braunstein et al. proposed a three-stage Min-Sum algorithm to dismantling networks. They first decycle a network with a variant of the Min-Sum message passing algorithm. After all cycles are broken, they break the remaining trees into small components until the largest component is smaller than the desired threshold. Finally, they refine the node set of network dismantling by moving some of them back to the original network. However, this kind of method tends to delete irrelevant nodes during the loop removal step and then moves them back to the original network in the following node re-inserting step, which reduces the disassembly efficiency.
However, when analyzing directed networks, most spectral methods using node adjacency matrices will use symmetric adjacency matrices to make the network undirected [
20]. This processing method will inevitably lose some information in the network [
21], resulting in the search process for the minimum set of nodes to be disassembled in the directed network which will add some unnecessary nodes and cause unnecessary disassembly.
5. Experimental Results
In order to verify the applicability of the DIR method in directed networks, it is used in artificial directed ER networks, BA networks and real networks, and the disassembly results are compared with two commonly used methods (GND algorithm and Min-Sum method). The dataset of
Table 1 is selected in the experiment, and the experimental comparison is carried out in different artificial directed networks and large-scale real networks (for the convenience of comparison, the disassembly scale and disassembly cost in this paper are both proportional).
Some scholars, e.g., Ren [
14] have proved that the GND method has a higher disassembly efficiency than other algorithms such as Min-Sum and information transfer in undirected networks when the network disassembly cost is the unit cost (number of nodes) and non-unit cost (based on node degree). When the disassembly scale is the same, the GND method has a lower disassembly cost than other algorithms, and the GND method can destroy the network structure with a smaller disassembly cost. This method has better performance than other algorithms in the disassembly of undirected networks. The DIR method is compared with the GND method and Min-Sum method, considering that the disassembly cost is the unit cost (i.e., the number of nodes).
Figure 2 and
Figure 3 are the disassembly results of different methods in different directed networks, where the corresponding curves of disassembly scale and disassembly cost are provided. The ordinate disassembly scale is the proportion of the number of nodes in the largest (strongly) connected subgraph, and the abscissa disassembly cost is the proportion of the number of nodes in the smallest disassembly set when disassembly is at the scale shown in the ordinate. As shown in
Figure 2, in a dense ER random network, when the required disassembly size is less than 0.25, the disassembly cost of the DIR method is smaller than that of the GND and Min-Sum methods; in the artificial BA directed network with an average degree of 10, the disassembly cost of the DIR method is significantly lower than that of the other two methods. Additionally, in the relatively sparse real directed network (
Figure 2), when the network is disassembled to the same specified scale, the DIR method has a lower cost than other methods. The reason for the difference is that methods such as GND and Min-Sum take the largest connected subgraph in the network as the disassembly subject. The DIR method takes the largest strongly connected subset of the network as the disassembly subject. When the network is dense enough, the size of the largest strongly connected subgraph and the connected subgraph in the directed network is not hugely different; however, in the relatively sparse network (such as the Weki network), the cost of applying the maximum strongly connected subgraph of the directed network for disassembly is significantly lower than that of the GND and Min-Sum methods. The experiments show that the DIR method has the advantage of lower cost in directed network disassembly, which shows the efficiency of the DIR method in directed network disassembly.
Next, we explore the impact of different disassembly methods on the network structure. By applying different disassembly methods to different networks and comparing the clustering coefficient [
26] and assortativity coefficient [
27] of the disassembly process network, the superiority of the DIR method to retain the network structure information to a great extent is proved.
The clustering coefficient in graph theory is used to measure the degree of node aggregation. There is evidence that in most real-world networks, especially in social networks, nodes tend to create relatively tightly connected groups; this possibility is often greater than the average probability of randomly establishing a relationship between two nodes. A network such as
is formally composed of a set of nodes and edges between nodes, with edges connecting nodes. The neighborhood
of a node
is defined as its adjacent node,
. The local clustering coefficient
of a node in a directed network is
As an alternative to the global clustering coefficient, Watt and Strogatz [
19] use the average of local clustering coefficients of all vertices as the overall clustering level of the network.
Here, we compare the influence of the DIR and the other two disassembly methods on the degrees of node connection in the network by observing the change of the global clustering coefficient of the network during the disassembly process, and then explore the impact on the network structure.
The experiment first disassembles the artificial ER random network and the BA network; the relationship between the disassembly cost and the clustering coefficient is shown in
Figure 4. When the disassembly cost is less than 0.7 in the artificial ER random network, the curve of the average clustering coefficient corresponding to the DIR method is more stable than the curve of the GND and Min-Sum method, and it also reaches a stable state first in the BA network. The DIR method has less disturbances for the clustering coefficient of the whole network compared to the GND and Min-Sum method, which reflects the superiority of removing overlapping nodes between modules by dividing the edge modules. The influence of the DIR method on the network structure is smaller than that of directly deleting nodes in the network; in the ER random network, the three methods will increase the network clustering coefficient with the disassembly in a certain period of time. This is because the disassembly has caused an increasing number of nodes in the network to appear in clusters. The result of the experiment in the real network is shown in
Figure 5. It can be seen that in the real-world directed network, with the increase in the disassembly cost, the three disassembly methods will reduce the clustering coefficient in the network, which is related to the sparsity of the real network. The dismantling of the nodes of the real network will reduce the agglomeration between nodes and the connection between groups will become sparse; however, it can still be seen from the graph that the curve corresponding to the DIR method is more gentle than the GND and Min-Sum methods. The influence of this aggregation phenomenon on the real network during the disassembly process is smaller than that of the two methods; and relatively speaking, the Min-Sum method will have a more obvious impact on the aggregation phenomenon of the network, because the Min-Sum method tends to remove nodes with large degrees in the network and is less able to protect the structural information of the network.
The coefficient of assortativity is used to measure whether the network is assortative or disassortative. It is used to investigate whether the nodes with similar values of degree in the network tend to be connected to its approximate nodes. It can be characterized by the Pearson coefficient r (degree-degree correlation). indicates that the entire network presents an assortative structure, and the nodes with large degrees tend to be connected to the nodes with large degrees. indicates that the entire network presents disassortativity, and r = 0 indicates that there is no correlation in the network structure. In the experiment, the change of the network structure by the dismantling of the DIR method is analyzed by observing the influence of the dismantling process on the network assimilation index.
As shown in
Figure 6 and
Figure 7, the changes of the assortativity in the network of the DIR method are less in number than in the other two methods. Whether in the ER random network or in the real network, when the disassembly cost is less than 0.7, the blue curve is more gentle, the change of the global assortativity of the network is smaller, and the influence of removing the overlapping nodes between the edge modules on the assortativity of the network is smaller than that of the GND and Min-Sum methods.
The module degree [
28] is used as a performance index to measure the community division. It is used to see the impact on the structure of the network community when we disassemble the network. The module degree function is
, where
is an element in the point adjacency matrix
,
,
is the community to which node
i and
j belong to, and
is the membership function. If
i and
j are in the same community, it is 1, otherwise it is 0.
The calculation of the module degree
Q in the disassembly process of the DIR method is shown in
Figure 8. It can be seen that when the disassembly cost is less than 0.8 in the picture, the module degree
Q increases with the increase in the disassembly cost. It shows that the removal node in the disassembly process also deletes the inter-group edges between different communities, which plays a certain role in promoting effective community division. When the cost is greater than 0.88, the disassembly will destroy the inter-group edges within the community and cause the modularity
Q to decrease sharply.
In summary, the directed network disassembly (DIR) method we proposed in all the experiments has a higher disassembly efficiency than the GND and Min-Sum methods in both artificial directed networks and real directed networks. When the network is disassembled to the same scale, the DIR method incurs the lowest cost; at the same time, by comparing the clustering coefficient and the assortative coefficient in the disassembly process, it is also proved that the DIR method can reduce the influence of disassembly on the network clustering coefficient and the assortative coefficient in the disassembly process, and can also effectively retain the information in the network; the influence of the DIR method on network modularity is also explored through experiments. When the disassembly scale is less than a given threshold, the DIR method has a certain promoting effect on network community division.