Estimating Residential Property Values on the Basis of Clustering and Geostatistics
Round 1
Reviewer 1 Report
The paper is quite interesting.
In my opinion the methodological approach is not particularly innovative but the author has used rather common tools in a coherent way and with good results.
Even the literary background is rather full-bodied and comprehensive.
I recommend publication in this form.
Author Response
The author thank to the anonymous reviewer for his work and attention given on this paper.
The text was verified by native speaker, the English lecturer.
Reviewer 2 Report
Please, see the annotations in the attached pdf file.
Comments for author File: Comments.pdf
Author Response
The author thank to the anonymous reviewer for their work and attention given on this paper.
I have got comments that helped in getting this article much better and will help me also in next research.
According to the editor suggestions the new parts of the article are written in red.
The text was verified by native speaker, the English lecturer.
Author Response File: Author Response.docx
Reviewer 3 Report
The Author presented research relevant to implementation automatic models to valuation real estate. Generally, scope of the research is very up-to-date and lively in wider international discussions. However, the presented specific results and approach has been already well studied field and the analysis is very basic and I cannot see any novelty. Moreover the presented findings consider by the Author as a new or innovative where overthrown to some extent 15 years ago by the researchers related to property market analyses. Therefore just a few major reviewer doubts to the specific assumptions and results were highlighted:
Ver.61. – "The multidimensional nature of geographical space is usually ignored and the impact of a property’s situation reduced to an analysis of its location and neighbourhood, treated as environmental features."
This statement seems to be misbelief. First it’s not true that geographical space is usually ignored that even Author confirmed this fact by presenting many authors that made an implementation of it. Furthermore, the Author claimed that researchers take only into account location and neighbourhood instead of nature of geographical space during the real estate market analyses. The question is: what then the nature of geographical space is? To differentiate it from location and neighbourhood?
Additionally the Author actually didn’t even define any location feature or neighbourhood, and try to classify real estate market taking into account just coordinates which is illogical when we want to create the real estate valuation chart (consider valuation map). It seems that Author didn’t consider fundamental knowledge about the real estate market which is specific domain of the analyses in which we can observe: uncertain, imprecision of property information, the sudden and unpredictable changes, absence of homogenous functional dependencies between real estate attributes, significant differences between real estates etc.
Ver.77 – "The most frequently used cartographic method are isolines or choropleth maps." and ver. 83.
What is the basis of such statement?
Ver. 88 – This is not new approach, for sure. eg: Maclennan, D & Tu, Y 1996, 'Economic perspectives on the structure of local housing systems' Housing Studies, vol. 11, no. 3, pp. 387-406. and many others.
Ver.215 – in the paper should be insert: the formula to update the transaction and verification of the model. Moreover, the Author didn’t present the specific information about the attributes and their domains. This has direct influence on the selected method results and their quality etc.
Ver. 221 – one of the attributes is the "standard of the flat", as a matter of fact that kind of attribute does not exist in the national property registry, so the question is: How the Author obtained this attribute for 1873 transactions from 2007 -2011 years?
Ver. 218 – the verification of the significance the correlation results should be presented.
Ver. 228 – the method of k-means clustering has a lot of obstacles eg: diversity of the analysed variable, coding of variable, distribution of the variables, that author didn’t mention.
Ver. 264 – "In the other clusters, a normal distribution of prices was obtained by removing outlying data." The author should present the method to remove outliers. It should be highlited that outliers should be carefully treated due to the fact that their are the additional (precious) information about market.
Figure 3 – without georeferences and topographic information is not readable, and the draw conclusion is not possible.
Ver. 336 – "The method used guarantees that within each of the zones reliability in the estimated property value is greater than 90%."
The Author states this brave conclusion on the basis of the Table 6 where verified the proposed approach on the basis of the several percent of the observations. The approaches have been analysed on the basis of the one market and outdated database which is not very reliable.
The main conclusion is that the analyses are too modest and too simplified the domain of the analyses that is unusually complex. The interpretation of the area of market in the "continuous way" (Fig. 3) is very controversial on the basis of the sparse data points (transactions). It is possible but needs very sophisticated analyses and researches.
Although, the selected themes relevant to the wider international debate, I can not recommend this paper to be published in the such recognized and scientific high level of Geosciences Journal.
Author Response
The author thank to the anonymous reviewer for their work and attention given on this paper.
I have got many deep and critical comments that helped in getting this article much better and will help us also in next research.
According to the editor suggestions the new parts of the article are written in red.
1. The Author presented research relevant to implementation automatic models to valuation real estate. Generally, scope of the research is very up-to-date and lively in wider international discussions. However, the presented specific results and approach has been already well studied field and the analysis is very basic and I cannot see any novelty. Moreover the presented findings consider by the Author as a new or innovative where overthrown to some extent 15 years ago by the researchers related to property market analyses. | The elaborated approach is based on the assumption that the non-spatial and spatial attributes of a property should be analysed separately. As a result, a location-insensitive model is developed in the first stage, based only on the properties’ structural characteristics. Using data mining techniques, in particular k-means clustering, clusters of similar properties creating submarkets are formed. Each cluster is characterised by defined attribute values, expressed on a rank scale. During the second stage of the elaborated approach, a pure spatial model is developed, based on the assumption that the value of a property in each of the clusters depends exclusively on the distance between properties with known prices and the property being evaluated. A literature review shows that some authors have opted for the use of geostatistical methods in mass appraisal (Basus, 2000), with some of them using the segmentation method of property market analyses (Maclennan et al, 1996). In the geostatistical methods the simplest varieties of kriging have been used. Cichociński (2009) carried out an attempt to apply the geostatistical methods, namely simple kriging, to interpolate property values, while Ligas (2009) as well as Colaco and Vucetic (2012) applied the regression-kriging model, called the hybrid model, to estimate the value of land. Other, more advanced methods of data processing and interpolation, including data integration, such as cokriging, were used rarely, with a publication of Chica-Olmo (2007) being one of the few. None of these methods has shown sufficient accuracy to significantly affect the methods of property evaluation. The application of the two-stage approach to mass valuation of properties using the k-means method, and the geostatitical method in the second stage, is a novel methodology, especially for local properties.
The number of articles indexed in WoS and Scopus databases on “Mass appraisal and geostatistics” equals 1 (from 1900 to 2019). The number of articles indexed in WoS and Scopus databases on “Mass appraisal and k-means method grouping” equals 1 (from 1900 to 2019). The number of articles indexed in WoS and Scopus databases on “Mass appraisal, geostatistics, and k-means method grouping” equals 0 (from 1900 to 2019). |
2. Ver.61. – „The multidimensional nature of geographical space is usually ignored and the impact of a property’s situation reduced to an analysis of its location and neighbourhood, treated as environmental features. “
This statement seems to be misbelief. First it’s not true that geographical space is usually ignored that even Author confirmed this fact by presenting many authors that made an implementation of it. Furthermore, the Author claimed that researchers take only into account location and neighbourhood instead of nature of geographical space during the real estate market analyses. The question is: what then the nature of geographical space is? To differentiate it from location and neighbourhood? | The answer to the doubts is in lines 340-345:
“In the hedonic model, the most commonly-used one for estimating property values, spatial position (location) is taken into account indirectly, by determining the accessibility and neighbourhood. Accessibility is measured as distance from the centre, in line with the location theory developed by von Thunen, and neighbourhood usually understood as the property’s purpose in the land development plan. The proposed model takes into account location, analysed in accordance with the rules of geostatistics and interpolation using the kriging model.” |
3. Additionally the Author actually didn’t even define any location feature or neighbourhood, and try to classify real estate market taking into account just coordinates which is illogical when we want to create the real estate valuation chart (consider valuation map). It seems that Author didn’t consider fundamental knowledge about the real estate market which is specific domain of the analyses in which we can observe: uncertain, imprecision of property information, the sudden and unpredictable changes, absence of homogenous functional dependencies between real estate attributes, significant differences between real estates etc.
| Kriging assumes that the distance or direction between sample points reflects a spatial correlation that can be used to explain variation in the surface. The Kriging tool fits a mathematical function to a specified number of points, or all points within a specified radius, to determine the output value for each location. Kriging is a multistep process; it includes exploratory statistical analysis of the data, variogram modelling, creating the surface, and exploring a variance surface. There is an assumption that when kriging is used there is no need to define the location or neighbourhood characteristics which would be applied to the grouping with the k-means method. |
4. Ver.77 – „The most frequently used cartographic method are isolines or choropleth maps. „ and ver. 83.
What is the basis of such statement? | A thorough analysis of cartographic presentation methods used to develop maps of property value was conducted in the doctoral dissertation. The reference has been added in the revised article. |
5. Ver. 88 – This is not new approach, for sure. eg: Maclennan, D & Tu, Y 1996, 'Economic perspectives on the structure of local housing systems' Housing Studies, vol. 11, no. 3, pp. 387-406. and many others. | Maclennan & Tu in the article ‘Economic perspectives on the structure of local housing systems' examines the notions of market and sub‐market in the context of housing. It first proposes specific definitions and then clarifies why the general characteristics of housing are likely to generate sub‐markets. In this article the author uses the division into sub-market. The aim of the article is, however, to develop a two-stage methodology for estimating property values using the k-means and geostatic methods, which is a new approach. The number of articles indexed in WoS and Scopus databases on “Mass appraisal, geostatistics, and k-means method grouping” equals 0 (from 1900 to 2019). |
6. Ver.215 – in the paper should be insert: the formula to update the transaction and verification of the model. Moreover, the Author didn’t present the specific information about the attributes and their domains. This has direct influence on the selected method results and their quality etc.
|
The table with attributes and its domains is added on page 6 in the revised text. Additional information about the methods of transaction prices updating is added on page 3 and 6. |
7. Ver. 221- one of the attributes is the “standard of the flat”, as a matter of fact that kind of attribute does not exist in the national property registry, so the question is: How the Author obtained this attribute for 1873 transactions from 2007 -2011 years? |
The attribute was taken from the Polish registry of property prices. |
8. Ver. 218. – the verification of the significance the correlation results should be presented. | The p-value was added to Table 2 (page 6). |
9. Ver. 228 – the method of k-means clustering has a lot of obstacles eg: diversity of the analysed variable, coding of variable, distribution of the variables, that author didn’t mention. |
The author added the limitation of k-means method on page 11 (line 355-360). |
10. Ver. 264 - “In the other clusters, a normal distribution of prices was obtained by removing outlying data. “ The author should present the method to remove outliers. It should be highlited that outliers should be carefully treated due to the fact that their are the additional (precious) information about market.
|
Some more information on removing outlying data was added. It is obvious that outliers constitute additional information about the property market. However, in this case typical properties were used only. |
11. Figure 3. – without georeferences and topographic information is not readable, and the draw conclusion is not possible. | Figure 3 shows kriging interpolation only. It doesn’t present maps of property values or any elements from an additional map.
|
12. Ver. 336 – “The method used guarantees that within each of the zones reliability in the estimated property value is greater than 90%. The Author states this brave conclusion on the basis of the Table 6 where verified the proposed approach on the basis of the several percent of the observations. The approaches have been analysed on the basis of the one market and outdated database which is not very reliable. |
The accuracy of the property value estimates was checked using a test sample of 10% of the properties not taken into account in the interpolation.
Further research will deal with a verification of the method in another location of property market, using different property data. |
13. The main conclusion is that the analyses are too modest and too simplified the domain of the analyses that is unusually complex. The interpretation of the area of market in the “continuous way” (Fig. 3) is very controversial on the basis of the sparse data points (transactions). It is possible but needs very sophisticated analyses and researches.
| The theoretical and minimum number of properties on the basis of which interpolation with the kriging method can be applied is 30. The analysis shows that for 90% reliability of the property value estimate this number should be at least 200. It is also necessary to evenly include the whole area. It is possible to obtain so many data sets, especially in small towns, when data from multiple years are analysed. Nearly 2000 transactions from a 5-year time period have been used in the analysis. |
Author Response File: Author Response.docx
Reviewer 4 Report
The paper overall is interesting and it refers to a topic, which is both timely and original. It is well - written and of good quality and presents a two-stage model for estimating the value of residential property, applied on a real-world case-study in Poland. The literature review part (introduction) could be enriched, e.g. see: Giannopoulou, M., Vavatsikos, V. and Lykostratis, K. (2016), A Process for Defining Relations between Urban Integration and Residential Market Prices, Procedia - Social and Behavioral Sciences, vol. 223, pp. 153-159.
Moreover, a critical point is that the authors should provide more information at the results part, as the discussion right now is rather limited. The authors could also try to provide the reader with some comparative information, comparing the results of this case study to other case studies with similar characteristics worldwide in order to explore potential similarities and differences. Last but not least, the section “Summary and Conclusions” is currently only a summary – no major conclusions are drawn, the authors just repeat briefly what they did in this research. Therefore it is necessary that they move their analysis one step further and add the main lessons learnt after the application of their model to Siedlce, and also some ideas - perspectives of what would be an interesting continuation of this research.
Author Response
The author thank to the anonymous reviewer for their work and attention given on this paper.
I have got many deep comments that helped in getting this article much better and will help me also in next research.
According to the editor suggestions the new parts of the article are written in red.
The text was verified by native speaker, the English lecturer.
1. The paper overall is interesting and it refers to a topic, which is both timely and original. It is well - written and of good quality and presents a two-stage model for estimating the value of residential property, applied on a real-world case-study in Poland. The literature review part (introduction) could be enriched. | The literature review was enriched. |
2. Moreover, a critical point is that the authors should provide more information at the results part, as the discussion right now is rather limited. The authors could also try to provide the reader with some comparative information, comparing the results of this case study to other case studies with similar characteristics worldwide in order to explore potential similarities and differences. | A new chapter with discussion has been added. The description of the results has been expanded. |
3. Last but not least, the section “Summary and Conclusions” is currently only a summary – no major conclusions are drawn, the authors just repeat briefly what they did in this research. Therefore it is necessary that they move their analysis one step further and add the main lessons learnt after the application of their model to Siedlce, and also some ideas - perspectives of what would be an interesting continuation of this research. | The conclusions are expanded. |
Author Response File: Author Response.docx
Round 2
Reviewer 3 Report
The author’s answer for the reviewer’s doubts are not satisfied. Reviewer aswers for author's comments - red fond.
1. The Author presented research relevant to implementation automatic models to valuation real estate. Generally, scope of the research is very up-to-date and lively in wider international discussions. However, the presented specific results and approach has been already well studied field and the analysis is very basic and I cannot see any novelty. Moreover the presented findings consider by the Author as a new or innovative where overthrown to some extent 15 years ago by the researchers related to property market analyses.
| The elaborated approach is based on the assumption that the non-spatial and spatial attributes of a property should be analysed separately. As a result, a location-insensitive model is developed in the first stage, based only on the properties’ structural characteristics. Using data mining techniques, in particular k-means clustering, clusters of similar properties creating submarkets are formed. Each cluster is characterised by defined attribute values, expressed on a rank scale. During the second stage of the elaborated approach, a pure spatial model is developed, based on the assumption that the value of a property in each of the clusters depends exclusively on the distance between properties with known prices and the property being evaluated. A literature review shows that some authors have opted for the use of geostatistical methods in mass appraisal (Basus, 2000), with some of them using the segmentation method of property market analyses (Maclennan et al, 1996). In the geostatistical methods the simplest varieties of kriging have been used. Cichociński (2009) carried out an attempt to apply the geostatistical methods, namely simple kriging, to interpolate property values, while Ligas (2009) as well as Colaco and Vucetic (2012) applied the regression-kriging model, called the hybrid model, to estimate the value of land. Other, more advanced methods of data processing and interpolation, including data integration, such as cokriging, were used rarely, with a publication of Chica-Olmo (2007) being one of the few. None of these methods has shown sufficient accuracy to significantly affect the methods of property evaluation. The application of the two-stage approach to mass valuation of properties using the k-means method, and the geostatitical method in the second stage, is a novel methodology, especially for local properties.
The number of articles indexed in WoS and Scopus databases on “Mass appraisal and geostatistics” equals 1 (from 1900 to 2019). The number of articles indexed in WoS and Scopus databases on “Mass appraisal and k-means method grouping” equals 1 (from 1900 to 2019). The number of articles indexed in WoS and Scopus databases on “Mass appraisal, geostatistics, and k-means method grouping” equals 0 (from 1900 to 2019). |
Reviewer answer-second review: Combining spatial and non-spatial approaches in property analyses are definitely not innovative approach. Look for example in papers in Real Estate Management and Valuation(REMV) journal. Author proved that mass appraisal and geostatistics did not exist in the papers on WoS. It is misunderstanding. The author should look more broad scope at property analyses and geostatistic or similar to this topics. Moreover author probably misunderstood automated valuation methods and mass appraisals which are not synonyms. Detecting author’s approach the reviewer could understood that author propose mass appraisal procedure that has very complex procedure and has big consequences regarding mistakes. This indicated that aim of study is not clear . In my opinion the paper can be considered as a procedure of property market analyses in investment advisory instead of mass appraisal procedure. | |
2. Ver.61. – „The multidimensional nature of geographical space is usually ignored and the impact of a property’s situation reduced to an analysis of its location and neighbourhood, treated as environmental features. “
This statement seems to be misbelief. First it’s not true that geographical space is usually ignored that even Author confirmed this fact by presenting many authors that made an implementation of it. Furthermore, the Author claimed that researchers take only into account location and neighbourhood instead of nature of geographical space during the real estate market analyses. The question is: what then the nature of geographical space is? To differentiate it from location and neighbourhood? | The answer to the doubts is in lines 340-345:
“In the hedonic model, the most commonly-used one for estimating property values, spatial position (location) is taken into account indirectly, by determining the accessibility and neighbourhood. Accessibility is measured as distance from the centre, in line with the location theory developed by von Thunen, and neighbourhood usually understood as the property’s purpose in the land development plan. The proposed model takes into account location, analysed in accordance with the rules of geostatistics and interpolation using the kriging model.” |
Reviewer answer-second review: Not satisfied answer. Why better is using spatial position then distances or positioning. What are disadvantages of second one? | |
3. Additionally the Author actually didn’t even define any location feature or neighbourhood, and try to classify real estate market taking into account just coordinates which is illogical when we want to create the real estate valuation chart (consider valuation map). It seems that Author didn’t consider fundamental knowledge about the real estate market which is specific domain of the analyses in which we can observe: uncertain, imprecision of property information, the sudden and unpredictable changes, absence of homogenous functional dependencies between real estate attributes, significant differences between real estates etc. | Kriging assumes that the distance or direction between sample points reflects a spatial correlation that can be used to explain variation in the surface. The Kriging tool fits a mathematical function to a specified number of points, or all points within a specified radius, to determine the output value for each location. Kriging is a multistep process; it includes exploratory statistical analysis of the data, variogram modelling, creating the surface, and exploring a variance surface. There is an assumption that when kriging is used there is no need to define the location or neighbourhood characteristics which would be applied to the grouping with the k-means method. |
Reviewer answer-second review: Not satisfied eg.: mentioned ”variation in the surface” what does it mean regarding differentiation of the property in the space? Look at previous question as well. | |
4. Ver.77 – „The most frequently used cartographic method are isolines or choropleth maps. „ and ver. 83.
What is the basis of such statement? | A thorough analysis of cartographic presentation methods used to develop maps of property value was conducted in the doctoral dissertation. The reference has been added in the revised article. |
Reviewer answer-second review: Not satisfied . The author quoted just herself | |
5. Ver. 88 – This is not new approach, for sure. eg: Maclennan, D & Tu, Y 1996, 'Economic perspectives on the structure of local housing systems' Housing Studies, vol. 11, no. 3, pp. 387-406. and many others.
| Maclennan & Tu in the article ‘Economic perspectives on the structure of local housing systems' examines the notions of market and sub‐market in the context of housing. It first proposes specific definitions and then clarifies why the general characteristics of housing are likely to generate sub‐markets. In this article the author uses the division into sub-market. The aim of the article is, however, to develop a two-stage methodology for estimating property values using the k-means and geostatic methods, which is a new approach. The number of articles indexed in WoS and Scopus databases on “Mass appraisal, geostatistics, and k-means method grouping” equals 0 (from 1900 to 2019). |
Reviewer answer-second review: Not satisfied. Look at eg.: REMV numerous papers about this topic. The mentioned research just proved that similar analyses where considered more than 20 years ago. | |
6. Ver.215 – in the paper should be insert: the formula to update the transaction and verification of the model. Moreover, the Author didn’t present the specific information about the attributes and their domains. This has direct influence on the selected method results and their quality etc. |
The table with attributes and its domains is added on page 6 in the revised text. Additional information about the methods of transaction prices updating is added on page 3 and 6. |
Reviewer answer-second review: Not satisfied. Still no formulas and analyses of the significance of the model is provided. | |
7. Ver. 221- one of the attributes is the “standard of the flat”, as a matter of fact that kind of attribute does not exist in the national property registry, so the question is: How the Author obtained this attribute for 1873 transactions from 2007 -2011 years? |
The attribute was taken from the Polish registry of property prices. |
Reviewer answer-second review: Not satisfied. No answer for the “standard” variable. Registry of property prices collected data from notarial deed, where no standard is descripted. | |
8. Ver. 218. – the verification of the significance the correlation results should be presented. | The p-value was added to Table 2 (page 6). |
Reviewer answer-second review: Approved | |
9. Ver. 228 – the method of k-means clustering has a lot of obstacles eg: diversity of the analysed variable, coding of variable, distribution of the variables, that author didn’t mention. |
The author added the limitation of k-means method on page 11 (line 355-360). |
Reviewer answer-second review: Approved (although the k-means has much more disadvantages) | |
10. Ver. 264 - “In the other clusters, a normal distribution of prices was obtained by removing outlying data. “ The author should present the method to remove outliers. It should be highlited that outliers should be carefully treated due to the fact that their are the additional (precious) information about market.
|
Some more information on removing outlying data was added. It is obvious that outliers constitute additional information about the property market. However, in this case typical properties were used only. |
Reviewer answer-second review: Approved | |
11. Figure 3. – without georeferences and topographic information is not readable, and the draw conclusion is not possible.
| Figure 3 shows kriging interpolation only. It doesn’t present maps of property values or any elements from an additional map.
|
Reviewer answer-second review: So then for who and for what this Figures where prepared considering the scope of the analyses? | |
12. Ver. 336 – “The method used guarantees that within each of the zones reliability in the estimated property value is greater than 90%. The Author states this brave conclusion on the basis of the Table 6 where verified the proposed approach on the basis of the several percent of the observations. The approaches have been analysed on the basis of the one market and outdated database which is not very reliable.’
|
The accuracy of the property value estimates was checked using a test sample of 10% of the properties not taken into account in the interpolation.
Further research will deal with a verification of the method in another location of property market, using different property data. |
Reviewer answer-second review: Not satisfied. It should be verified with more observations. The results seems to be unreliable especially if the author consider this method in mass appraisal context. No formula to verify the results were delivered. | |
13. The main conclusion is that the analyses are too modest and too simplified the domain of the analyses that is unusually complex. The interpretation of the area of market in the “continuous way” (Fig. 3) is very controversial on the basis of the sparse data points (transactions). It is possible but needs very sophisticated analyses and researches.
| The theoretical and minimum number of properties on the basis of which interpolation with the kriging method can be applied is 30. The analysis shows that for 90% reliability of the property value estimate this number should be at least 200. It is also necessary to evenly include the whole area. It is possible to obtain so many data sets, especially in small towns, when data from multiple years are analysed. Nearly 2000 transactions from a 5-year time period have been used in the analysis. |
Reviewer answer-second review: More analyses and area should be considered to prove the obtained findings. |
Author Response
The author would like to thank the anonymous reviewer for their work and attention given to this paper. Many deep and critical comments have been used to improve this article and will be also used in the next research.
Please see the detailed point-by-point coverletter in the attachement.
Author Response File: Author Response.docx
Round 3
Reviewer 3 Report
Thank you for the answers.
Author Response
The author would like to thank the anonymous reviewer for their work and attention given to this paper.
The sentences in lines 10 and 95 have been revised. The sentence in line 106 has been removed.