Inferring Urban Social Networks from Publicly Available Data
Round 1
Reviewer 1 Report
The content of the article is presented quite competently, it corresponds to the research plan. The research methods are sufficiently disclosed, and the results of the research are quite transparent. As comments, I would like to note a very large volume of the text of the article, which leads to some distraction of the reader's attention, and also does not contribute to a deeper understanding of the meaning of the main idea of the study. Also, in my opinion, it is necessary to implement the justification of formula (2) in more detail.
Author Response
We kindly thank the Reviewer for his/her comments. We do recognize that the paper is a bit wordy, but we wanted to treat all aspects of the model in detail. The readers interested in gaining a quick grasp of the highlights of the model may refer to the conference paper, that this journal version extends and that is already cited in the submitted manuscript. Concerning formula (2), we modified and extended the text surrounding the formula to improve its interpretability.
In the revised manuscript, all added/modified text is highlighted in blue.
Reviewer 2 Report
In "Inferring urban social networks from publicly available data" authors present a data-driven model for urban social networks, which they also implement and release as an open source software. The model constructs an age-stratified and geo-referenced synthetic population whose individuals are connected by strong ties either due to intra-household or friendship. In the model, household links are data-driven, while for friendship links are based on a probabilistic model. With this model, the demographic and geographic factors governing the structure of the obtained network are studied for three Italian cities of different size.
I have very much enjoyed reading this paper. I find it comprehensive and clearly written, and introducing new, timely, and important results concerning how best to infer urban social network with a combination of data and modeling and how this could be applied on actual problems that will surely also inspire future research along these lines.
The following comments should be taken into account if a revision at Future Internet will be granted.
1) In the introduction, when referring to possible application related to COVID, the papers City size and the spreading of COVID-19 in Brazil, PLoS ONE 15, e0239699 (2020) and The impact of human mobility networks on the global spread of COVID-19, J. Complex Netw. 8, cnaa041 (2020) would be useful references as to the application of better network-inferring methods.
2) It would also improve the paper if the figure captions would be made more self contained. In addition to very briefly stating what is shown, one should also consider a sentence or two saying what is the main message of each figure.
3) I also feel like there are too many figures for the amount of text, and more importantly, some figures provide rather redundant information. I understand the authors want to present a comprehensive analysis, but perhaps a reconsideration of what really must be shown for the main message to be supported would be in order. Some figures could also be moved to a supplementary material.
4) Some references contain errors and inconsistent formatting and wrong data. Journals with no page numbers have routinely missing the article number. The authors should please correct this with the best of care.
If a revision will be granted, I will be happy to review a revised manuscript.
Author Response
We are happy to know that the Reviewer enjoyed reading the paper and that he/she appreciated our work. We thank him/her for his/her valuable comments, that we addressed as follows:
1) We added a brief mention in the Introduction to the fact that the structure of the network has a direct impact on any epidemic process occurring on it, citing one of the suggested papers in support. Although we do not specifically consider COVID19 nor disease spreading in this paper, we surely will in the next future and we will keep the suggested articles in great consideration.
2) For all figures and tables contained in the main text, we summarized the information conveyed by the figure/table in the corresponding caption.
3) We acknowledge the abundance of figures, but we preferred to keep all of them for the sake of comprehensiveness. Appendix C and D, where all figures already presented in the main text have been reproduced for the cities of Viterbo and Sabaudia, could indeed be moved to a supplementary material, but we leave the decision to the Editors.
4) We reviewed and corrected all bibliographic entries.
In the revised manuscript, all added/modified text is highlighted in blue.
Reviewer 3 Report
line 176, 'there is not much research'
when the authors state; 'Again, there is175no much research about data-driven spatial social network models, in which the location176of the individuals can be retrieved from real data.' please look at
-Olteanu, M., Randon-Furling, J., & Clark, W. A. (2019). Segregation through the multiscalar lens. Proceedings of the National Academy of Sciences, 116(25), 12250-12254.
-Olteanu, M., de Bezenac, C., Clark, W., & Randon-Furling, J. (2020). Revealing multiscale segregation effects from fine-scale data: A case study of two communities in Paris. Spatial Demography, 1-13.
For your components and discussion on the 'contraints' on the population activities please also look at;
-Malmberg, B., & Clark, W. A. (2020). Migration and neighborhood change in Sweden: The interaction of ethnic choice and income constraints. Geographical Analysis.
Could you also discuss the merits of the complete stochastic approach to the structural aspect of the problem discussed in;
-Ellam, L., Girolami, M., Pavliotis, G. A., & Wilson, A. (2018). Stochastic modelling of urban structure. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 474(2213), 20170700.Discuss the possibility of using a graph convolutional neural network to analyze the data of the problem to uncover the different types of residents (this paper also looks at weak and strong ties like yours does)
-Bidoki, N. H., Mantzaris, A. V., & Sukthankar, G. (2020). Exploiting weak ties in incomplete network datasets using simplified graph convolutional neural networks. Machine Learning and Knowledge Extraction, 2(2), 125-146.
eg. can the proximity be modeled in the adjacency matrix?
For |Vi|.|Vj|, it is unclear what this operation is performing
Will eqn 2 be normalized in the summation to be a probability and is it not a likelihood?
What is the 'mixing' value of M(i,j)? it is unclear
It is not obvious why the similarity comparison can be made in eqn 6.
2.3.1 the equation presented needs and eqn number and the notation for the log normal is not typical; please look at https://en.wikipedia.org/wiki/Log-normal_distribution or https://mathworld.wolfram.com/LogNormalDistribution.html
It is also far from clear how this imposes a type of constrain restriction upon the
In Figure 2 the title's meaning is not clear to the reader, please expand in the figure caption
Fig4 please put the plots in 3,2 where the homogeneous are one column and the real in the other
Fig9 the axis labels would be beneficial, to look at the ages as well
Author Response
We thank the Reviewer for his/her dedication and for the many valuable comments and suggestions, that we addressed as follows:
- We corrected line 176, as indicated.
- We do recognize a lack of consideration, in the original submission, for previous work that proposes data-driven analysis of urban populations/territories while not directly using a network-based model. We therefore added a paragraph at the end of the Related Work Section to make up for it, where 3 of the suggested papers have been mentioned and briefly discussed. In particular, we highlighted that other modeling approaches are possible that reveal multi-scale patterns of cohesion/segregation and that permit to measure the impact of distance upon individual choices based on a cost-benefit analysis.
- While appreciating the suggestion, we decided not to discuss the use of graph-based neural networks to classify the individuals of our population. Indeed, the mentioned paper assumes that the network structure is part of the available data, and addresses the task of using this information to infer a set of labels for the vertices of the network. Our paper addresses the somehow opposite task of inferring a network structure based on the available socio-demographic data. That said, the use of the suggested network-based classification may be a valuable instrument to assess the quality of the inferred network, by verifying whether our network is capable of correctly encoding information about the given population (such as geographical proximity, as suggested). This falls, however, beyond the scope of the current paper and will be left to future work.
- We clarified (right after using |Vi|.|Vj| to compute m_i,j) that |V_i| is the cardinality of V_i. We believe it is now self explanatory that the dot denotes the product.
- Equation (2) is not normalized to sum to 1 over all possible u,v. This type of normalization would be necessary if we extracted a fixed number of edges, each chosen independently at random from the set of possible edges. Instead, we iterate over all pairs and we decide if the edge (u,v) exists with probability Pr[u,v]. We added a brief paragraph before and after eq. (2) to clarify that the existence of (u,v) is a Bernoulli of parameter Pr[u,v]. We also specified that Pr[u,v] is defined so as to guarantee, by construction, that a few structural properties of the graph are preserved.
- The terminology "mixing" and "total mixing" to denote the quantities M(i,j) and M were unnecessary and possibly confusing. We thus removed them and we partially reorganized the discussion of the properties of the friendship graph present right after eq. (2).
- We added an explanation of how the approximation in eq. (6) is derived.
- We numbered the equation in 2.3.1.
- As far as we know, the typical notation is just Lognormal(mu,sigma^2), but we needed a shorter notation, especially for the occurrence of the Lognormal in tables and figures. We thus kept this "atypical" notation in the revised manuscript, hoping for the Reviewer's understanding.
- We extended the caption of Figure 2, also taking into account the request by another Reviewer.
- We have "transposed" Figure 4 (and all following analogous figures), as requested.
- We added the axis labels in Figure 9 to make it clear which block corresponds to which age.
In the revised manuscript, all added/modified text is highlighted in blue.
Round 2
Reviewer 2 Report
The authors have revised their manuscript comprehensively and with love to detail. I warmly recommend publication in present form.