Identifying and Ranking Influential Nodes in Complex Networks Based on Dynamic Node Strength
Round 1
Reviewer 1 Report
In this work Authors propose a new method for dynamic decomposition of network / nodes' rankings in order to maximize the effect of diffusion. This method has been compared to a number of other methods using the SIS process and the set of compared methods did include k-shell (which relates to DNSD) and other baselines used in this task. The results indicate that in fact DNSD can contribute to the field.
However, I am expecting a number of concerns to be addressed:
1) firstly, authors do introduce their S metric that takes into account second-order neighbourhood. In general, there are works that actually confirm that the second-order neighbourhood can contribute to the results, e.g. [1]. However, I would like the authors to explain the rationale on why the second order neihbourhood is considered by them an important step forward compared to k-shell. This is not explicitly stated in the work.
2) secondly, it is hard to overlook the power of weak ties that has been already underlined in a number of recognized works, e.g. [2]. However, authors still prefer to focus on the degree - most likely to keep the adequateness with k-shell concept. However, could authors compare themselves against centrality(betweenness)-based metrics? Since it seems that this is missing. Moreover, I think that other methods in Table 1 are not described in the manuscript - this should be either made more clear in the manuscript (I did not find MDD and Cnc description) or described if it is not there.
There are minor typos in the work, but careful proofreading should get rid of them. Please also make sure that you use spaces (" ") where needed, since sometimes the text is simply sticked, e.g. lines 112-113.
@article{kajdanowicz2016learning, title={Learning in unlabeled networks--An active learning and inference approach}, author={Kajdanowicz, Tomasz and Michalski, Rados{\l}aw and Musial, Katarzyna and Kazienko, Przemys{\l}aw}, journal={AI Communications}, volume={29}, number={1}, pages={123--148}, year={2016}, publisher={IOS Press} } @article{granovetter1973strength, title={The strength of weak ties}, author={Granovetter, Mark S}, journal={American journal of sociology}, volume={78}, number={6}, pages={1360--1380}, year={1973}, publisher={University of Chicago Press} }Author Response
Thanks for your advice. Here is our reply:
(1) reply to "1) firstly, authors do introduce their S metric that takes into account second-order neighbourhood. In general, there are works that actually confirm that the second-order neighbourhood can contribute to the results, e.g. [1]. However, I would like the authors to explain the rationale on why the second order neihbourhood is considered by them an important step forward compared to k-shell. This is not explicitly stated in the work.":
You are right. In fact, the effect of neighbors in some methods has been confirmed in some works[1,2,3]. However, sometimes it doesn't work well[4,5]. The effect of neighbors depends on method mechanism itself. In our method, the process of decomposing networks could be regarded as an inverse process of spreading from influential nodes. When each edge is weighted, actually the importance of spreading paths are distinguished. The more neighbors are considered, the more accurately the spreading is predicted. In theory, it's a choice between efficiency and accuracy. Considering the computational complexity, we only apply the second-order neighbourhood.
We supplement that the reason why neighbors work in [ line 124 - 127 ].
(2) reply to "2) secondly, it is hard to overlook the power of weak ties that has been already underlined in a number of recognized works, e.g. [2]. However, authors still prefer to focus on the degree - most likely to keep the adequateness with k-shell concept. However, could authors compare themselves against centrality(betweenness)-based metrics? Since it seems that this is missing. Moreover, I think that other methods in Table 1 are not described in the manuscript - this should be either made more clear in the manuscript (I did not find MDD and Cnc description) or described if it is not there.":
"The power of weak ties" does exist in some cases where local static indexes are applied. However, small-scale interaction becomes translated into large-scale patterns. Our method adopts the idea of decomposing network. Of this type, dynamic "decomposing" is the bridge of micro-macro. In each iteration, some relatively unimportant nodes are removed from network and the remaining nodes are relatively important. Actually, each iteration in dynamic decomposition process could be regarded as an inverse process of spreading. This is why we use this kind of mechanism. As for the comparison against centrality(betweenness)-based metric, Cnc method outperform betweenness centrality(BC) and closeness centrality(CC) in the specific networks (NetScience and Email) according to Figure.3 in [6]. However, our method is slightly better than Cnc in the similar networks. In addition, BC and CC are not suitable for large networks because of too much calculation.
The descriptions of MDD and Cnc are really a few. We supplement the descriptions in [ line 75 - 92 ]
(3) reply to "There are minor typos in the work, but careful proofreading should get rid of them. Please also make sure that you use spaces (" ") where needed, since sometimes the text is simply sticked, e.g. lines 112-113.":
We go through the paper again. I hope we've revised all of typos.
(4) We supplement some experiments not done before: (i)the spreading capacity measured by the SIR simulation in Power Grid [ subgraph (m) - (p) of Figure 4 ] ; (ii) the τ of DNSD and the τ of DNSD+ with different α [ subgraph (e) - (i) of Figure 5 ] ; (iii) the τ between the rankings obtained by our methods and node influence simulated by the SIR model. [ subgraph (c) - (f) of Figure 6 ] .
(5) We revise the first paragraph and supplement some content [line 32 - 36, line 46 -51]. Because some contents are similar to the paper by Joonyum (2014) .
Thanks again for your advice. It's very helpful to us.
[1] Liu Y , Tang M , Zhou T , et al. Identify influential spreaders in complex networks, the role of neighborhood[J]. Physica A: Statistical Mechanics and its Applications, 2016.
[2] Joonhyun, Bae, Sangwook, et al. Identifying and ranking influential spreaders in complex networks by neighborhood coreness[J]. Physica A: Statistical Mechanics and its Applications, 2014, 395(4):549-559.
[3] Ren Z M , Zeng A , Chen D B , et al. Iterative resource allocation for ranking spreaders in complex networks[J]. EPL (Europhysics Letters), 2014, 106(4):48005.
[4] Wang Z , Du C , Fan J , et al. Ranking Influential Nodes in Social Networks Based on Node Position and Neighborhood[J]. Neurocomputing, 2017, 260(oct.18):466-477.
[5] Bian T , Hu J , Deng Y . Identifying influential nodes in complex networks[J]. Physica A Statistical Mechanics & Its Applications, 2012, 391(4):1777-1787.
[6] Joonhyun, Bae, Sangwook, et al. Identifying and ranking influential spreaders in complex networks by neighborhood coreness[J]. Physica A: Statistical Mechanics and its Applications, 2014, 395(4):549-559.
Reviewer 2 Report
The authors propose a new method for ranking the node influence in complex networks, called "dynamical node strength decomposition" (DNSD). They state that DNSD can rank nodes according to their importance in diffusion phenomena, and to prove it, they compute their method together with other ranking methods for a set of network datasets.
However, there is a problem that invalidates, in my opinion, the results and consequently the conclusions of the work, and here is why. To evaluate the quality of a ranking method, the authors use the SIS model as a benchmark of an epidemic phenomenon, computing "the number of infected nodes in the end of spreading process ... to evaluate the influence of node" [line 131]. However, "the recovery possibility is set to 0" [line 132], so they actually use a SI model. In a SI model, the final number of infected nodes caused by a particular node corresponds exactly with the size of the component the node belongs to; and this number is the same for all the nodes of the component regardless of the characteristics of the node. Similar works that evaluate different centrality measures for identifying spreaders use the SIR model and compute the number of recovered nodes as the measure of the influence of a node (De Arruda 2014). The authors should correct this error and redo all the experiments and results before publishing this work
I have other comments, some of them not minors.
The authors should specify that they propose a ranking method for epidemic spreading. There are other spreading processes such as rumor dynamics that require other models to evaluate the performance.
The selected methods for comparing DNSD, such as the mixed degree decomposition (Zeng 2012) and the coreness centrality (Joonyum 2014) should be briefly explained. They only describe the k-shell method. At this point, it could be interesting to explain the difference between equation 1 of their method and the mixed degree decomposition method. The residual degree and exhausted degree used by Zeng are the existing degree and the vanishing degree used by the authors. Moreover, if I have understood DNSD, I think that the node strength computed at each step (equation 3) is nothing more than the sum of the mixed degree (Zeng) of the node (multiplied by the number of neighbors) and their neighbors (assuming beta equals to 0 as the authors do in their work); if so, it could be commented in the paper.
The Introduction section should be rewritten because now it is quite similar to the introduction of the paper by Joonyum (2014) and it could be considered plagiarism.
The results could be compared with the results by De Arruda (2014) who besides proposing a new centrality measure for the identification of influential spreaders, evaluates other centrality methods in random and spatial networks. Conclusions by De Arruda (2014) are quite important, there is no centrality measure that is the best because the results depend on the type of spreading process and the type of network.
References
De Arruda, G.F.; Barbieri, A.L.; Rodriguez, P.M.; Moreno, Y.; Costa, L.D.F.; Rodrigues, F.A. Role of centrality for the identification of influential spreaders in complex networks. Phys Rev E Stat Nonlin Soft Matter Phys, 2014, 90(3), 032812.
Zeng, A.; Zhang, C.J. Ranking spreaders by decomposing complex networks. Physics Letters A, 2012, 377(14), 1031-1035.
Joonhyun, B.; Sangwook;. Identifying and ranking influential spreaders in complex networks by neighborhood coreness. Physica A: Statistical Mechanics and its Applications, 2014, 395(4), 549-559
Author Response
Thanks for your advice. Here is our reply:
(1) reply to "However, there is a problem that invalidates, in my opinion, the results and consequently the conclusions of the work, and here is why. To evaluate the quality of a ranking method, the authors use the SIS model as a benchmark of an epidemic phenomenon, computing "the number of infected nodes in the end of spreading process ... to evaluate the influence of node" [line 131]. However, "the recovery possibility is set to 0" [line 132], so they actually use a SI model. In a SI model, the final number of infected nodes caused by a particular node corresponds exactly with the size of the component the node belongs to; and this number is the same for all the nodes of the component regardless of the characteristics of the node. Similar works that evaluate different centrality measures for identifying spreaders use the SIR model and compute the number of recovered nodes as the measure of the influence of a node (De Arruda 2014). The authors should correct this error and redo all the experiments and results before publishing this work":
Your correction is correct. SIR model should be adopted. We checked our code and find that we actually applied a model similar to SIR model but not SIR model. The mistake lies in the confusion of concepts between nodes that have been infected throughout the spreading process and nodes that are infected in one spread. And our description of the spreading model in the article is wrong. We revise these mistakes [ line 168 - 183 ].
We redo all the experiments [ line 224 - 283 ] and supplement some experiments not done before: (i)the spreading capacity measured by the SIR simulation in Power Grid [ subgraph (m) - (p) of Figure 4 ] ; (ii) the τ of DNSD and the τ of DNSD+ with different α [ subgraph (e) - (i) of Figure 5 ] ; (iii) the τ between the rankings obtained by our methods and node influence simulated by the SIR model. [ subgraph (c) - (f) of Figure 6 ] .
(2) reply to "The authors should specify that they propose a ranking method for epidemic spreading. There are other spreading processes such as rumor dynamics that require other models to evaluate the performance.":
It is true that we did not mention the specific types of spreading process. We specify that we propose a ranking method for epidemic spreading [ line 168 -171 ].
(3) reply to "The selected methods for comparing DNSD, such as the mixed degree decomposition (Zeng 2012) and the coreness centrality (Joonyum 2014) should be briefly explained. ":
The description of mixed degree decomposition (Zeng 2012) and coreness centrality (Joonyum 2014) are supplemented [ line 75 -92 ].
(4) reply to "At this point, it could be interesting to explain the difference between equation 1 of their method and the mixed degree decomposition method. The residual degree and exhausted degree used by Zeng are the existing degree and the vanishing degree used by the authors. Moreover, if I have understood DNSD, I think that the node strength computed at each step (equation 3) is nothing more than the sum of the mixed degree (Zeng) of the node (multiplied by the number of neighbors) and their neighbors (assuming beta equals to 0 as the authors do in their work); if so, it could be commented in the paper.":
We don't describe the algorithm very clearly. There are too many symbols about K in the manuscript. We want to prevent the confusion of symbols, but it makes the expression unclear. And the order of Equation 2 and Equation 1 is reversed. We rivise these mistakes [Equation 1, Equation 2, Equation 3] and corresponding description [ line 93 -116]. So the difference between DNSD and MDD is clear, which can be seen from the Figure 2 actually. First, we calculate weight of edge according to nodes at both ends. Then, the edges with weight are divided into exsiting edges and vanishing edges, and this method of weighting is consistent with MDD. Third, the neighbors are considered in Equation 3.
(5) reply to "The Introduction section should be rewritten because now it is quite similar to the introduction of the paper by Joonyum (2014) and it could be considered plagiarism.":
The first paragraph in the introduction of our article is really similar to the paper by Joonyum (2014) . We revise the first paragraph and supplement some content [line 32 - 36, line 46 -51].
(6) reply to "The results could be compared with the results by De Arruda (2014) who besides proposing a new centrality measure for the identification of influential spreaders, evaluates other centrality methods in random and spatial networks. Conclusions by De Arruda (2014) are quite important, there is no centrality measure that is the best because the results depend on the type of spreading process and the type of network.":
If I have understood the random walk accessibility proposed by De Arruda (2014), it seems very similar to the local spreading simulation from target node i within the specific range which is limited by the "length h" in paper. If the "h" is set to enough large, The computation could be huge. Except for degree centrality, methods compared with our method are based on decompsing networks, which have small computation.
Different methods are exactly suitable for different types of network. The network structure in real world is rarely so simple as star graphs, ring graphs or complete graphs. Finding out the structure of network and choosing the corresponding method is hard to practice. Because it's hard to define which structure a type of network has. And there are always multiple types of structures in a real network. However, this approach is worth trying.
So we revise partial description about conclusion [ line 272 -283; line 294 - 298] as " We can draw conclusion that the proposed methods are better than degree centrality, k-shell, MDD and Cnc methods in the specific types of networks, such as Email, Co-authors and Power Grid networks when SIR simulation is applied." Limit the conclusion to some conditions.
Thanks again for your advice. It's very helpful to us.
Round 2
Reviewer 1 Report
Thank you for providing your answers to my concerns. All of these have been addressed and I am suggesting to accept the manuscript in its present form.
Author Response
Thank you for taking the time to review our manuscript. Your opinion is very helpful to us.
Reviewer 2 Report
I still have doubts about the first comment of my review. I'm sorry but I do not understand the authors' response about the epidemic model they are using to measure the influence of nodes. I would like to ask them to answer the next questions.
(1) The authors say that:
"We checked our code and find that we actually applied a model similar to SIR model but not SIR model"
Then, what model are they using, i.e. SI, SIS, SIR, ...?
(2) In the manuscript the authors say:
"We apply susceptible-infected-susceptible model (SIS) to simulate spreading influence of nodes ... At each time step, each infected individual attempt to infect its neighbors with the probability theta ... In this paper the recovery possibility is set to 0 simply"
According to this description, the authors use a SI model. Nodes can be in two states, i.e. susceptible or infectious; an infection happens with probability theta, and an infectious node never recovers and continues being infectious forever. As I have explained in my first review, in a SI model, the final number of infected nodes caused by a particular node corresponds exactly with the size of the component the node belongs to. You do not need simulation to get to this conclusion, but if you want to check visually this result I suggest seeing the next Netlogo model of a virus spreading in a network (please, set the recovery-chance parameter to 0)
https://netlogoweb.org/launch#https://netlogoweb.org/assets/modelslib/Sample%20Models/Networks/Virus%20on%20a%20Network.nlogo
(3) In the unknown epidemic model used by the authors "At each time step, each infected individual attempt to infect its neighbors ... this process is repeated until each node has been infected". Then the authors use "the number of infected nodes in the end of spreading process is an indicator to evaluate the influence of node". In the response to my review, the authors add "the mistake lies in the confusion of concepts between nodes that have been infected throughout the spreading process and nodes that are infected in one spread". What does it mean "one spread"? What is the difference between "one spread" and "the spreading process"? As I have explained before, after "each node has been infected" the number of infected nodes is the size of the component of the first infected node.
The authors have roughly answered the other issues of my first review and I do not have more comments to do.
Author Response
Thanks for your advice. Here is our reply:
(1)In the last reply, "Reply (1)" is to explain why we made a mistake. We use SIR model now, as described at [ line168 -183 ] in the revision :
"We remark that there are various spreading processes on networks such as epidemic, rumor, information spreading and so on. In the previous works, the method of decomposing network is proved to be suitable for the epidemic spreading process. In this Letter, we apply susceptible-infected-recovered model (SIR) to simulate spreading capacity of nodes. In the SIR model, there are three kinds of individuals: (S) susceptible individuals that could be infected, (I) infected individuals who have the ability to infect susceptible individuals. Infected individuals are likely to recover with the probability η and become (R) recovered individuals who have immunity to disease and no ability to infect others. Initially, the spreading process starts with only one seed individual infected, and all other individuals are susceptible. At each time step, each infected individual attempt to infect its neighbors with the probability θ, then recover with the possibility η. The epidemic spreading process terminates when there is no infected individual in the network and the disease cannot spread anymore. In this paper the recovery possibility η is set to 1. And we take the average range of recovered population, µ(i), over a sufficiently large number of simulations, which are set to be 1000, as the indicator to evaluate the influence of node i."
(2)Reply to "The authors have roughly answered the other issues of my first review "
We re-show the changes in revision according to your fisrt review, as follows:
(a) reply to "The authors should specify that they propose a ranking method for epidemic spreading. There are other spreading processes such as rumor dynamics that require other models to evaluate the performance.":
It is true that we did not mention the specific types of spreading process. We specify that we propose a ranking method for epidemic spreading [ line 168 -171 ] in the revision:
"We remark that there are various spreading processes on networks such as epidemic, rumor, information spreading and so on. In the previous works, the method of decomposing network is proved to be suitable for the epidemic spreading process. In this Letter, we apply susceptible-infected-recovered model (SIR) to simulate spreading capacity of nodes."
(b) reply to "The selected methods for comparing DNSD, such as the mixed degree decomposition (Zeng 2012) and the coreness centrality (Joonyum 2014) should be briefly explained. ":
The description of mixed degree decomposition (Zeng 2012) and coreness centrality (Joonyum 2014) are supplemented [ line 75 -92 ] in the revision::
"The mixed degree decomposition (MDD) method notes that the dynamic process of decomposing network have influence on the resulting result. Some nodes are removed at each decomposition step and the edges linked to these nodes are removed too, which changes the degree of the remaining nodes. MDD defines a index named as the mixed degree that is the weighted sum of the residual degree (number of edges linked to the remaining nodes) and the exhausted degree (number of edges linked to the removed nodes). The nodes are removed according to the mixed degree in each step of MDD procedure. The effect of MDD method to the sample network in Figure 1 is shown in Table 1. This idea that considers the influence of the removed edge also adapted in our method. However, the edges are weighted based on the nodes at both ends.
The neighborhood coreness centrality method (Cnc) found that the ks of each node generated by the k-shell method reflects the information of location and a spreader with more neighbors located in the core of the network is more influential. Cnc adopts k-shell method to obtain the ks of each node first. Then, the neighborhood coreness centrality of target node is equal to the sum of its neighbors’ ks . The ranking list of node influence is generated based on the Cnc. There is a improved Cnc method named as the extended neighborhood coreness Cnc+, which denotes the ks of neighbors and 2-step neighbors as the metric of node influence."
(c) reply to "At this point, it could be interesting to explain the difference between equation 1 of their method and the mixed degree decomposition method. The residual degree and exhausted degree used by Zeng are the existing degree and the vanishing degree used by the authors. Moreover, if I have understood DNSD, I think that the node strength computed at each step (equation 3) is nothing more than the sum of the mixed degree (Zeng) of the node (multiplied by the number of neighbors) and their neighbors (assuming beta equals to 0 as the authors do in their work); if so, it could be commented in the paper.":
We don't describe the algorithm very clearly. There are too many symbols about K in the manuscript. We want to prevent the confusion of symbols, but it makes the expression unclear. And the order of Equation 2 and Equation 1 is reversed. We rivise these mistakes [Equation 1, Equation 2, Equation 3] and corresponding description [ line 93 -113] in the revision:
"In many works, the edges in unweighted networks usually are treated equally. This means that for different nodes, there are only differences in the number of edges, but no differences in features of edges. If there is only one edge between two nodes with a large number of edges, the importance of this edge is obviously vital. In this moment, it is unreasonable that this edge and other edges are treated equally. Actually, edges are spreading paths, which are different in the spreading process. Information could spread more widely through some specific paths. Thus, the importance of different edges should be distinguished, which depend on the nodes connected by the edges"
(d) reply to "The Introduction section should be rewritten because now it is quite similar to the introduction of the paper by Joonyum (2014) and it could be considered plagiarism.":
The first paragraph in the introduction of our article is really similar to the paper by Joonyum (2014) . We revise the first paragraph and supplement some content [line 32 - 36, line 46 -51] in the revision:
"This is caused by a variety of factors: first, the standard for a node to be removed is only its degree, which does not take local influence of its neighbors into account. Second, the process of removing nodes is recursive, and the order of nodes removed will affect the ranking result. Third, the global characteristics of the network are considered insufficiently and so on."
"Wang et al. utilizes neighborhood attributes and entropy method to weight the node position based on iteration information in the k-shell decomposition. All in all, decomposing networks to identify the node influence is advisable."
Thanks again for your advice. It's very helpful to us.