Next Article in Journal
Building a Large-Scale Micro-Simulation Transport Scenario Using Big Data
Next Article in Special Issue
Mapping Public Urban Green Spaces Based on OpenStreetMap and Sentinel-2 Imagery Using Belief Functions
Previous Article in Journal
PLD-SLAM: A New RGB-D SLAM Method with Point and Line Features for Indoor Dynamic Scene
Previous Article in Special Issue
A Contributor-Focused Intrinsic Quality Assessment of OpenStreetMap in Mozambique Using Unsupervised Machine Learning
 
 
Article
Peer-Review Record

The Impact of Community Happenings in OpenStreetMap—Establishing a Framework for Online Community Member Activity Analyses

ISPRS Int. J. Geo-Inf. 2021, 10(3), 164; https://doi.org/10.3390/ijgi10030164
by Moritz Schott 1,*, Asher Yair Grinberger 2, Sven Lautenbach 3 and Alexander Zipf 1,3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
ISPRS Int. J. Geo-Inf. 2021, 10(3), 164; https://doi.org/10.3390/ijgi10030164
Submission received: 29 December 2020 / Revised: 28 February 2021 / Accepted: 6 March 2021 / Published: 14 March 2021

Round 1

Reviewer 1 Report

I am still impressed very much by the success and sustainability of OSM. Studies - like the one presented here - can help to ensure this successs and sustainability.

Regarding to the paper I have a few comments:

The finding that advanced users are less affected by happenings than newcomers seems to be trivial.

I did´nt fully understand the measures defined for the characteristics. How are the values calculated given in figure 3? ("abstract values")

Figures 5-7 are giving a number of figures describing how the behaviour of the users has changed after a happening. My question is how significant these results are? I should be obvious that a happening should have a positve affect on novices, but these numbers change if you take another group. But probably you wanted to demonstrate numerically that there are affects.

Did you investigate how the background of the people affets the results, characteristics like age, sex, education ....

 

 

Author Response

## Reviewer 1

Dear reviewer,
Thank you for providing us with a set of meaningful comments. These had helped us in identifying that the contribution we see in this paper is not emphasized clearly enough. We revised the text accordingly and feel that through this the paper had much improved. Furthermore, we found your point about Figure 3 to be very relevant and revised it to be more coherent. Please see our detailed responses below.

### The finding that advanced users are less affected by happenings than newcomers seems to be trivial.

We agree with the reviewer that these findings probably would have been expected by most researchers in the field. Yet an analyses of where these contributors are affected and where they are not was missing until now. Furthermore, we feel that the finding that advanced mappers neither changed their contribution type, quality nor their community integration, is quite surprising. All of theses measures could have been affected through cleanup work in the area of the event or as a by product of the increased contribution quantity after events.

We have added more detail on what our expectations and hypotheses were regarding the influence on advanced mappers by enhancing figure 3. We also  added more details where we think our finding brought surprising new insights, even if these insights are 'non-effects' of happenings (see lines 571-578).

### I did´nt fully understand the measures defined for the characteristics. How are the values calculated given in figure 3? ("abstract values")

We have redesigned figure 3 and added a better explanation in the text as well as in the figure caption. We now emphasise more clearly that this figure represents our framework model through a hypothetical course of development of a hypothetical mapper taking part in an event. (see figure 3 and lines 207-218)

### Figures 5-7 are giving a number of figures describing how the behaviour of the users has changed after a happening. My question is how significant these results are? I should be obvious that a happening should have a positve affect on novices, but these numbers change if you take another group. But probably you wanted to demonstrate numerically that there are affects.

In case the above comment relates to statistical significance, please note that all findings we reported in this study were significant with a p-value smaller 0.05. As we ran a total of about 500 statistical tests and the results section seems to already be quite dense we chose to state our level of significance at the end of section 2.5 and state the actual effect size in layman's terms rather than stating individual p-values. Figures 6-8 (previously 5-7) represent the most striking findings of our study meaning the findings with the largest effect. We have added the plots for the remaining metrics to the supplementary materials.

In case the comment relates to the significance of the content, we want to stress that the subdivision of happening attendees into the presented groups of novice and advanced mappers as well as by happening type was chosen based on the literature where these groups are commonly analysed separately and findings often differ. In addition, our previous knowledge in the field pointed towards different effects between these groups. The study concentrated on these two aspects as the research community as well as the OSM project seem to have a large knowledge gap in this area and at the same time will profit most from this analyses.
Of course a large number of different mapper attributes may be used to separate contributors like home location or main mapping interest that would also bring about new insights and possibilities.

We have added more possible future research opportunities regarding how the analyses of the effects of happenings among different mapper types may bring about valuable new insights (see lines 671-681).

 

### Did you investigate how the background of the people affets the results, characteristics like age, sex, education ....

During the analyses we have analysed secondary parameters like 'skill', 'income' and 'culture' and their effect on the results. We did not include these findings as on the one hand they would have overcharged the paper with new methods and results and on the other hand the impacts of these parameters on the results were rare, low or self evident.

Regarding the parameters mentioned by the reviewer: these socioeconomic attributes are not available and cannot be inferred in a quantitative study on OSM. OSM also forbids automatic enquiries to request this information directly from the contributors, not only because of privacy issues. In a qualitative setup e.g. through a survey or questionnaire these insights would though provide great value to the OSM community.

We have added more possible future research opportunities to the outlook to address this comment (see lines 671-681).

Reviewer 2 Report

This is a well written and interesting paper.

Some points should be presented in more details or clarified.

Specific comments:

Line 190 - Section 2.3: It is not explained why "Physical Location" does not appear in this section as in section 2.2.

Line 248-249: Why only one year was investigated and why 2016 (that is 4 years ago) was selected?

Line 258 -260: It is important to comment on the number of users taking part in each events which are more or less 15-20 people.

Line 260: Why users that contributed on the first half of 2016 were selected?

Table 2: the word "wise" is used only in some variables description

Line 264: Were descriptive attributes combined? From Table 2, it seems that they have been used independently. More information of these datasets should be provided in order to inform the reader of the classes existing in each one and their distribution in relation to the calculation of distances.

Line 268: two categories are examined for three event types not six

Table 3 needs to be described in detail It is not obvious what each number represents. e.g. Lines 1-5 distinguish users in terms of time of using OSM. which n values sum up to 76 ?? what do line 5 -10 report? where are advanced users in this table?

Line 331: what is the one year interval?

Line 345: what is the two year period?

Line 386: enormous???

Line 388: organic???

 

 

 

 

Author Response

## Reviewer 2

Dear reviewer,

thank you for your detailed and accurate comments, pinpointing areas where incomplete explanations existed. Your review has enabled us to step back from our authors blindness and add more detailed information for the actual reader. Especially your comments on the concrete implementation of the analyses as well as on a more precise phrasing of individual words helped us to better express our findings. Please find below our detailed responses.

### Line 190 - Section 2.3: It is not explained why "Physical Location" does not appear in this section as in section 2.2.

Thank you we indeed owed that information to the reader. We have added a paragraph stating that we assume the physical location to not change due to an event (see lines 235-244). It is extremely unlikely that a user will move her permanent residence due to OSM.

### Line 248-249: Why only one year was investigated and why 2016 (that is 4 years ago) was selected?

The collection of events as well as the mapper extraction described in the paper require a considerable amount of manual work such as web-scraping and mapper account verification. This is mainly due to the fact that the OSM calendar is not standardised and only to a very limited extent machine readable. The organizers of many events choose to state date and country of the event but provide details on the exact location and type only within a linked pdf-file, custom website or social network. The data collection and computation for the presented study took place in early 2018 making 2016 the latest timestamp for a long term analyses of two years.

We have added a summery of this information to the methods section (see lines 287-295).

### Line 258 -260: It is important to comment on the number of users taking part in each events which are more or less 15-20 people.

Details on the mean number of users per happening as well as the actually observed numbers were added in the referenced paragraph (see lines 314-315).

### Line 260: Why users that contributed on the first half of 2016 were selected?

The users for the Control Group were randomly selected to best represent the 'general' non-event mapper in OSM suitable to be compared to the two happenings-users. In order to assure maximum comparability users that contributed in the same temporal window as the conduction of the analysed events took place were selected.

We added this information to the referenced paragraph (see lines 316-319).

### Table 2: the word "wise" is used only in some variables description

We have adapted the phrasing in Table 2 (now table 3).


### Line 264: Were descriptive attributes combined? From Table 2, it seems that they have been used independently. More information of these datasets should be provided in order to inform the reader of the classes existing in each one and their distribution in relation to the calculation of distances.

The phrasing was a residue from the technical implementation where indeed all four layers were intersected to create areas of equal attributes along all four dimensions. Yet each layer can in fact just as well be queried separately to extract the descriptive attributes of a contribution and they were in fact analysed individually.

The different datasets were described in more detail providing information on the available classes (see lines 341-348). Also the explanation on the calculation of the digital area was extended (see lines 260-263).

We have now also populated the linked repository with the supplementary materials. This repository contains the dataset used in the analyses.

 


### Line 268: two categories are examined for three event types not six

We rephrased this paragraph to make it more clear that the users were analysed in six groups defined by three event types and two mapper experience levels (see lines xx-xx).

### Table 3 needs to be described in detail It is not obvious what each number represents. e.g. Lines 1-5 distinguish users in terms of time of using OSM. which n values sum up to 76 ?? what do line 5 -10 report? where are advanced users in this table?

We enhanced the grouping in table 2 to increase readability and moved and enlarged a paragraph from the results section to the data sources section to provide the reader with more context on table 2 already the first time it is referenced (see lines 351-352).


### Line 331: what is the one year interval?

We adapted the phrasing here to remind the reader of the meaning of the different time intervals that were analysed (see lines 441).

### Line 345: what is the two year period?

We adapted the phrasing here to remind the reader of the meaning of the different time intervals that were analysed (see lines 456).

### Line 386: enormous???

The line was rephrased and additional information added to show what we mean by 'enormous': more contributions than the other groups (often >50), mainly creations (see lines 497-499)

### Line 388: organic???

We replaced the phrase by a more precise explanation of what we mean: field mappers could not be distinguished from the control group (a general representation of 'the OSM mapper') and therefore seem to be less specialised then remote mappers (see lines 505-507).

Reviewer 3 Report

The authors proposed an integrated framework to evaluate the effects of happenings on different types of contributors of OSM. The findings of newcomers and advanced mappers are interesting, and the explanations about these findings are rational and to some extent in line with reality. The overall quality of this work is good and exhibits a certain level of novelty. The comments from the reviewer are listed as follows:

 

  • The text and symbols in Figure 2(b) are hard to read, please polish them.
  • The virtual y-axis in Figure 3 is confusing to readers, a table along the timeline with a more explicit description might improve the readability.
  • How were the statistical tests applied in this work? The authors just mentioned these tests in line 283, but no results are found in the following contents. Please clarify it.
  • It might produce interesting findings by investigating the personal characteristics corresponding to those outliers in the boxplots of figure 5.

Author Response

## Reviewer 3

Dear reviewer,
Thank you for investing time into identifying and pointing out large and small issues in the paper. We have revised the manuscript in accordance with your comments, a process which had helped us in better communicating our methods and results to the reader. Please see our detailed responses below. Also, thank you for the interesting suggestion about further analysis, which indeed could produce more knowledge. Below we detail some initial results and explain our choices regarding how and where to include them.

### The text and symbols in Figure 2(b) are hard to read, please polish them.

We adapted the figure to enlarge text and element sizes (see figure 2).

### The virtual y-axis in Figure 3 is confusing to readers, a table along the timeline with a more explicit description might improve the readability.

We have redesigned figure 3 and added a better explanation in the text as well as in the figure caption. We now emphasise more clearly that this figure represents our framework model through a hypothetical course of development of a hypothetical mapper taking part in an event. (see figure 3 and lines 207-218)

### How were the statistical tests applied in this work? The authors just mentioned these tests in line 283, but no results are found in the following contents. Please clarify it.

We clarified in the mentioned paragraph that for each time interval all valid mappers in each of the two experience level groups were seen as individuals forming a statistical sample. The general Kruskal-Wallis rank sum test, and if applicable the Wilcoxon test were applied to each sample individually testing for differences in change attributes (or total mapping attributes for newcomers) between mappers of the three mapping groups. All following results presented are based on this statistical effect validation meaning that any effect reported as well as any missing effect reported is based on the outcomes of the statistical tests with a p-value of 5%. The methods section includes more details on these tests now, clarifying the procedure which we followed (see lines 367-388). 

 

### It might produce interesting findings by investigating the personal characteristics corresponding to those outliers in the boxplots of figure 5.

We thank the reviewer for this interesting suggestion. During the extraction of happening attendees we filtered bots to minimise outliers but in any open project a fair amount of outliers is still to be expected. We checked on the OSM profile of some of the outliers in the CRM-group in figure 5. While the mappers themselves did not seem any different form the general OSM mapper we found two possible explanations for their outlier position:

 - some of the outliers had a very small amount of total contributions. If only one of these few contributions was a tag change, the shares in figure 5 will shift in that direction
 - the other mappers we checked were rather active at that time. Yet they seem to have used the addr:* tag more often then normal CRM mappers. If these users added the addr:country or addr:city tag to a large amount of buildings during the mapathon this could result in the observed outlier pattern. Both information (country and city) may eventually be available to remote mappers.

Another possible explanation could be the distraction of these mappers from the actual mapping goal to a more local mapping where tag changes are more frequent as observed in the CFM group.

We chose to not include this information in the text in fear of distracting the reader from from the main content. We though added a notion of the potential of these insights to the outlook (see lines 671-681).

Reviewer 4 Report

This manuscript is original in that it attempts to quantitatively evaluate the performance of OpenStretMap contributions due to "happening-centered events". It is useful to see how the referenced cases are scrutinized and modeled in terms of their impact on the community. On the other hand, there are several points that need to be revised.

1) Previous works
For HOT events, why don't you cite the paper published in the "Nature scientific report" that the authors recently published: https://doi.org/10.1038/s41598-021-82404-z

For the discussion and notes for OSM, the existing papers may be helpful. https://doi.org/10.3390/ijgi9060372

2. Sources of analysis data
Since the analysis of the whole paper is highly abstract, it would be good if there is a list of collected events, such as Supplementary Materials. Nothing was archived in the link to Github, but disclosure would make the analysis data more transparent.

3. Analysis method

The authors point out that the CRMs are biased toward activities in Europe and the United States, and this is certainly readily apparent in the calendars and other documents analyzed. On the other hand, in terms of activities in Asia, it is unlikely that there are no such activities at all. Therefore, it would be good if future research could also consider how to examine the diversity of CRMs' activities.

The abstract summary as shown in Figure 3, it would have been good to describe a specific use case that is unique, such as the case where the largest number of contributors participated among the analysis targets. Without concrete examples, readers would not be able to clearly understand the difference between CRM and CFM.

Author Response

## Reviewer 4

Dear Reviewer,
Thank you for providing us with a set of very concrete and useful comments. We have integrated all of your suggestions into the paper, including the references you have suggested, the addition of supplementary data, integrating missing details, and making Figure 3 clearer. Please see our detailed responses below.

### Previous works

#### For HOT events, why don't you cite the paper published in the "Nature scientific report" that the authors recently published: https://doi.org/10.1038/s41598-021-82404-z

The paper was not available up until after the submission of this manuscript. We have happily included it in the related work section as well as in our discussion (see lines 80 and 502).

#### For the discussion and notes for OSM, the existing papers may be helpful. https://doi.org/10.3390/ijgi9060372

Thank you for reminding us of that publication that is well suited to support our notion of Notes as community measure. We have added it to the materials and methods (lines 126).


### Sources of analysis data

#### Since the analysis of the whole paper is highly abstract, it would be good if there is a list of collected events, such as Supplementary Materials. Nothing was archived in the link to Github, but disclosure would make the analysis data more transparent.

We have now updated the repository with the requested data and code as mentioned in the Supplementary Materials section.

We have also included a figure depicting the number and location of events (see figure 4).

### Analysis method

#### The authors point out that the CRMs are biased toward activities in Europe and the United States, and this is certainly readily apparent in the calendars and other documents analyzed. On the other hand, in terms of activities in Asia, it is unlikely that there are no such activities at all. Therefore, it would be good if future research could also consider how to examine the diversity of CRMs' activities.

The analyses exclusively contained events in Europe and the US with only one event in the Philippines being an exception. We have added this detail to the text also describing in more detail what we think is the reason for this bias (see line 300-306 and figure 4). We also extended the discussion with more possible future research areas (see line 498-504).

#### The abstract summary as shown in Figure 3, it would have been good to describe a specific use case that is unique, such as the case where the largest number of contributors participated among the analysis targets. Without concrete examples, readers would not be able to clearly understand the difference between CRM and CFM.

We have redesigned figure 3 and added a better explanation in the text as well as in the figure caption. We now emphasise more clearly that this figure represents our framework model through a hypothetical course of development of a hypothetical mapper taking part in an event. (see figure 3 and lines 207-218)

We think that a figure explaining our model in more detail and how we hypothesise regarding changes in the trajectory of mappers over time communicates better the outline of the new framework to the reader. This is required before discussing the concrete implementation, as well as the results and finding, with detail.

Concerning the difference between CRMs and CFMs we have added more information to section 1.1 where we introduce these two types of events (see lines 76-78).

Round 2

Reviewer 2 Report

The revised version of the paper can be published.

Back to TopTop