Case Study on Privacy-Aware Social Media Data Processing in Disaster Management
Round 1
Reviewer 1 Report
The paper presents some research on applying a special algorithm to control the privacy of social media that is collected, stored, and then analyzed by non technical specialists during a crisis. The overarching research theme is an interesting and relevant topic on the use of social media for crisis situation analytics. However as presented, the paper did not seem to produce any compelling results and it was a little unclear as to what scenario the the proposed research would be applicable to. The paper generally could have been strengthened by a much more compelling, specific example of how social media privacy has been compromised during crisis analytics. This was alluded to in some literature references but this is an important point that should be clearly made to demonstrate the novelty of this research. Additionally, the researchers may consider using a different focus group as the group in question that was used seemed to focus more about what the actual work is they do and how they find relevant information as opposed to privacy concerns. The following are specific comments made when reading the paper for suggestions to improve the manuscript and overall research study.
The paper could be strengthened by a little more discussion on the actual privacy risks that are associated with social media storage by groups that monitor social media (As opposed to social media providers like Twitter). Typically APIs that offer access to social media do not provide specific user details. What are the exact risks and what are some specific examples of those risks being realized in terms of citable case study?
The literature review section on the use of social media and disasters could be enhanced by also citing a few papers on rumor control And rumors that are propagated through social media during disasters.
Where the focus group participants are aware of the fact that privacy can be an issue with social media? Do they already have some type of mechanism for accounting for privacy of information that may be used based on social media that they analyzed?
The findings of the discussion section didn't appear to make any discussion of how privacy issues are addressed by the people who volunteered to analyze social media. The section read more about the general work these people undertake it was not clear what the real contribution of the specific research questions are.
Author Response
> The paper could be strengthened by a little more discussion on the actual privacy risks that are associated with social media storage by groups that monitor social media (As opposed to social media providers like Twitter). Typically APIs that offer access to social media do not provide specific user details. What are the exact risks and what are some specific examples of those risks being realized in terms of citable case study?
We have restructured and extended the section in order to better point out the privacy risks that data retention of a third party arises. We also added a reference to the standard reference to the right of informational self-determination by Edward J. Eberle.
> The literature review section on the use of social media and disasters could be enhanced by also citing a few papers on rumor control And rumors that are propagated through social media during disasters.
Thank you for this hint. We have added two more important papers to the “Fake News” section: a 2014 paper by Kate Starbird that investigated misinformation after the bombing of the Boston Marathon. We have also added a recent paper by Jan Kirchner and Christian Reuter that focuses on the acceptance of solution approaches.
> Where the focus group participants are aware of the fact that privacy can be an issue with social media? Do they already have some type of mechanism for accounting for privacy of information that may be used based on social media that they analyzed?
We have now added two further sentences to make it clearer that the participants are aware of the topic. Nevertheless, there are still no or not fully established methods and mechanisms. However, since the VOST helpers all participated in our case study voluntarily and with great commitment, we assume that they are highly motivated to meet this challenge in the future.
> The findings of the discussion section didn’t appear to make any discussion of how privacy issues are addressed by the people who volunteered to analyze social media. The section read more about the general work these people undertake it was not clear what the real contribution of the specific research questions are.
We added a paragraph with explanations about how participants addressed privacy issues before and after our technical introduction to the HLL specifics. This way we highlighted that there were different ideas about privacy aspects in general between participants and the authors.
Our contribution is addressed in section 4.2 where we point out, that it is possible to deploy the HLL data structure on to the datasets the VOSTs work with.
Reviewer 2 Report
Preamble
This reviewer is not an expert on data management but has extensive experience in disasters, especially emergency evacuations from floods and tsunamis. Of relevance, is the importance of accurate information and communications from official sources and inter-personal contacts during and after a disaster.
Aim of the Manuscript
The aim is “to avoid unnecessary data retention…, in order to prevent subsequent abuse, theft or public exposure of collected datasets and thus, protect the privacy of social media users.” The central question posed by the authors is: what opportunities and challenges can be identified for Virtual Operations Support Teams (VOST) to work with privacy-enhanced data and what potential implementation barriers can be detected? The authors consider the efficacy of an estimation algorithm, HyperLogLog (HLL), that stores data in a privacy-aware structure, such that they cannot be used for purposes other than the original intention and probe its application through a focus group approach of a VOST in Germany.
Methodology
The literature review explains the fundamentals and present relevant research results from the use of online media during disasters. The role of VOST is explained - digital volunteers who pursue the goal of integrating data from social media more effectively into decision-making processes and a closer connection to emergency management agencies. They describe aspects of privacy and data retention and “privacy-aware data”.
The methods section explains the authors’ design for focus group discussions (video conference) with two groups of VOST members in Germany to examine the feasibility of implementing HLL data in the VOST workflow and their sensitivity for the privacy aspect. Three hypothetical scenarios (corona virus) and were asked to discuss their respective approach in each of them one after the other. To explain the advantages of HLL the authors presented a sample dataset that could potentially serve as real data for a VOST operation so that its members would feel familiar with. The sample covered all German posts on Twitter containing the hashtag “#corona” (72.7 million posts) from January-May 2020.
Evaluation of Manuscript
The manuscript is appropriately structured, well written, thoroughly referenced and concise. Overall, the contents are of high quality. Section 4 presents the evaluation of the findings from the focus group. The authors demonstrate a high degree of critical reflection on the design of their survey (e.g. lines 311-316) and this is an important element of a good scientific enquiry. This research team is well positioned to continue to contribute to the emerging field of disaster infomatics (e.g. lines 364ff). There are only a couple of comments.
First, the authors make reference to disasters, such as the earthquake in Haiti in 2010, the Elbe flood in 2013 or the urban flashflood in Münster in 2014, where “social media helped thousands of volunteers to spontaneously network with each other and to actively participate in disaster management (Sackmann et al. 2018)”. It would be informative to the reader given the topic is VOST to have a short description as to exactly how volunteers actively participated in “disaster management”.
Secondly. There are a few minor corrections as follows.
- 7 line 284 “…is te far most…” - the
Comes et al…pages missing
Line 432… The New York Times should be in italics
Line 443 …Forbes Magazine – Italics
Löchner, Marc..reference incomplete after LESSON
Miller, Vincent…SAGE place of publication
Prüfer, Peter, …reference incomplete
Rouse…TechTarget Network - italics?
WHO…place of publication? Presumably Geneva.
Author Response
> First, the authors make reference to disasters, such as the earthquake in Haiti in 2010, the Elbe flood in 2013 or the urban flashflood in Münster in 2014, where “social media helped thousands of volunteers to spontaneously network with each other and to actively participate in disaster management (Sackmann et al. 2018)”. It would be informative to the reader given the topic is VOST to have a short description as to exactly how volunteers actively participated in “disaster management”.
We added a more detailed explanation, how the volunteers participated, along with a reference to a (German) paper by Fathi et al.
> Secondly. There are a few minor corrections as follows.
line 284 “…is te far most…” - the
Comes et al…pages missing
Line 432… The New York Times should be in italics
Line 443 …Forbes Magazine – Italics
Löchner, Marc…reference incomplete after LESSON
Miller, Vincent…SAGE place of publication
Prüfer, Peter, …reference incomplete
Rouse…TechTarget Network - italics?
WHO…place of publication? Presumably Geneva.
Thank you for these hints. Unfortunately, we are unable to improve some of them following your requests. Although our bibliography has all the information requested, we are relying on the citation style, that is set in the MDPI-demanded publishing template. The citation style defines e.g. tyopgraphic features such as italics.
Reviewer 3 Report
Manuscript ID: ijgi-968093 entitled " Case study on privacy-aware social media data processing in disaster management ", has been reviewed by me but I feel the author(s) must improve different sections in the manuscript. My comments are as below:
1. In section 1, the author pointed out that social media plays a vital role in crisis communication and information collection. However, it is a pity that the author only pointed out the importance of "platform", but in a disaster, what is the important message? What is the challenge of message requests? What is the relationship between the message and privacy? The author did not point out.
2. In section 2, the author explained the initiation of crisis informatics research due to disaster management, and pointed out that volunteers and technical communities (V&TC) also face challenges in collecting, processing, and analyzing images from social media. But what is the challenge? Does it mean that the “crisis information” collection is a challenge? Or does it mean that the “collection of crisis information” is a challenge? What are the relationships between them with this manuscript? It is worthy of further explanation.
3. In the Data acquisition section, the meanings of hashtag, actual_count, hll_count, and hll_data provided in Figure 2 are not explained and are not suitable for all interested readers, author(s) have to consider readers’ convenience. and are not suitable for all interested readers.
4. In the Evaluation stage, there are too few descriptions of objective data. If it is quantitative data, there must be statistical results. If it is qualitative data, it must be supported by protocol analysis data of the discussion content...
Author Response
First of all thank you very much for your remarks. Before we update the submission, could you please rephrase or specify the following parts:
> 1. In section 1, the author pointed out that social media plays a vital > role in crisis communication and information collection. However, it is > a pity that the author only pointed out the importance of \"platform\", > but in a disaster, what is the important message? What is the challenge > of message requests? What is the relationship between the message and > privacy? The author did not point out. We are not sure, what you mean by "message". Please specify. > 4. In the Evaluation stage, there are too few descriptions of objective > data. If it is quantitative data, there must be statistical results. If > it is qualitative data, it must be supported by protocol analysis data > of the discussion content...We are not sure, what kind of data you are expecting. The case study is about a focus group discussion. We do have a log, but we can not publish it because it is not agreed with the participants, and furthermore it is in German. Please specify.
Round 2
Reviewer 1 Report
It was difficult to determine the actual changes made from the response letter. Recommend submitting a version of the article showing specific changes that were made.
Author Response
We have updated the paper again, according to the third reviewer's remarks. The amendment of this submission includes a full set of differences of the original submission from the current version.
Author Response File: Author Response.pdf
Reviewer 3 Report
Manuscript ID: ijgi-968093 entitled " Case study on privacy-aware social media data processing in disaster management ", has been revised by author(s), but I feel there still have clear flaws the author(s) didn’t improve. My comments are as below:
1. The title of the article talks about privacy-aware, but who really requests the importance of privacy? Is it a user of social media? Or an analysis team of social media?
The findings from the research show that in addition to the training program in the analysis team, the deployment of HyperLogLog in the data collection process will not distract the attention of the data analysis process.
So it seems that the privacy-aware is aimed at the analysis team, not the general user. However, the data input in disaster management is the general user. What is the correlation between privacy-aware and disaster management when inputting data by the general user? What is the correlation between privacy-aware and disaster management of the data analysis team? What is the motivation of this research seems not clear. In section 1 the reviewer cannot find out.
And it seems also to affect the scope of the literature review in section 2.
2. In section 3, researchers and VOST members discussed the feasibility of implementing HLL data and its sensitivity to privacy.
2.1 What are the research questions? What are the contents discussed relate to what research questions? I didn't see the author's explanation at all.
2.2 All three hypothetical scenarios had the pandemic spread of the Corona virus as their basic operational situation, and only differed in the location level of operation: national, regional, and local. What is the purpose of the three regional distinctions? Why are the situations arranged to relate to what research questions? No explanation, too.
2.3 In the end, only two PostgreSQL databases are compared, one for HLL-processed data and one for the plain data, to compare against. In this way, only HLL-processed data and plain data can be compared.
Therefore, almost no normal inference process can be seen, and the final results and findings are unreliable.
Author Response
> Manuscript ID: ijgi-968093 entitled " Case study on privacy-aware social media data processing in disaster management ", has been revised by author(s), but I feel there still have clear flaws the author(s) didn’t improve. My comments are as below: > 1. The title of the article talks about privacy-aware, but who really requests the importance of privacy? Is it a user of social media? Or an analysis team of social media? > The findings from the research show that in addition to the training program in the analysis team, the deployment of HyperLogLog in the data collection process will not distract the attention of the data analysis process. > So it seems that the privacy-aware is aimed at the analysis team, not the general user. However, the data input in disaster management is the general user. What is the correlation between privacy-aware and disaster management when inputting data by the general user? What is the correlation between privacy-aware and disaster management of the data analysis team? What is the motivation of this research seems not clear. In section 1 the reviewer cannot find out. > And it seems also to affect the scope of the literature review in section 2.Thank you for your remarks. According to them, we have updated our paper again. You can see the differences in the amendment of this submission.
Additionally, we would like to draw your attention to the following passages in the text:
In the Introduction [L29-L34] we describe the correlation between privacy-awareness and disaster management from a social media user's point of view and we outline examples of critical data [L31].
The correlation between privacy-awareness and disaster management from an analyst's point of view is outlined [L47-L54].
Our main point here is the problem of data retention, which is explicitly named and explained, why it can be dangerous.
We amended a paragraph [L45-L48] in which we emphasize the severeness of privacy aspects for users of social media services.
We also point out [L55-L61] the necessity for privacy-aware data storage methods and an already existing working example.
We point at prior research [L62-L68] and that questions about privacy and VOST are not considered, yet.
Following, we explicitly name our research questions starting on line 67.
They summarize as: can VOSTs work with HLL-processed data the same way it can work with raw data?
The Fundamentals subsection 2.3 [L187-L193] again explains the correlation between privacy-awareness and disaster management, both from a user's and the VOST's perspective, bundled with further explanations and references.
In addition, we restructured and amended subsection 2.4. We point at the disparity in the privacy aspect [L210-L212] and refer to our research question [L238].
> 2. In section 3, researchers and VOST members discussed the feasibility of implementing HLL data and its sensitivity to privacy. > > 2.1 What are the research questions? What are the contents discussed relate to what research questions? I didn't see the author's explanation at all.We updated the Methods section and refer to the research motivation [L245-L250]. We take them up again in the Evaluation in lines 294-296. The Conclusion sums up motivation and methods by rephrasing the research questions [L396-L400].
> 2.2 All three hypothetical scenarios had the pandemic spread of the Corona virus as their basic operational situation, and only differed in the location level of operation: national, regional, and local. What is the purpose of the three regional distinctions? Why are the situations arranged to relate to what research questions? No explanation, too.Thank you especially for pointing out this gap. We added some more points to emphasize the reasoning behind the three scenarios [L280-L285].
> 2.3 In the end, only two PostgreSQL databases are compared, one for HLL-processed data and one for the plain data, to compare against. In this way, only HLL-processed data and plain data can be compared. > Therefore, almost no normal inference process can be seen, and the final results and findings are unreliable.The technical evaluation of the privacy-aware social media data storage method has been published by Dunkel, Löchner and Burghardt [1] and is out of scope of this work.
In this case study we focused on the feasibility of deoloying that method to a real-life example use case.
[1]: Alexander Dunkel, Marc Löchner, and Dirk Burghardt. 2020. “Privacy-Aware Visualization of Volunteered Geo-Graphic Information (Vgi) to Analyze Spatial Activity: A Benchmark Implementation.” ISPRS International Journal of Geo-Information.
Author Response File: Author Response.pdf