Quantifying Liveability Using Survey Analysis and Machine Learning Model
Abstract
:1. Introduction
1.1. Purpose Statement
- To design a tangible metric (the individual’s preference to live in any place is defined as a tangible metric in this study) for liveability at an individual level and make it scalable across any administrative unit (the administrative unit here defines a postal code, a village, a city, or a state);
- To understand and quantify the marginal contribution of the different factors of liveability towards the designed metric.
1.2. Research Design
- Hypotheses generation;
- Design & distribution of the questionnaire;
- Collection & analysis of survey data (Feature Engineering & data munging);
- Fitting ML Classification model to predict liveability;
- Using the SHAP model to quantify the marginal contribution of the different factors.
2. Methodology
2.1. Hypotheses Generation
2.2. Design and Exploratory Analysis of the Questionnaire
2.3. Building an ML Model
2.3.1. Feature Engineering & Data Munging for ML Training
2.3.2. Fitting ML Classification Model to predict Liveability
Splitting the Train-Test Data
- Training set: a subset from the dataset to train the model;
- Test set: a holdout subset from the dataset to evaluate the model.
Hyperparameter Tuning
2.3.3. Predicting the Test Data
- False Positive (FP): (Type 1 Error) The model wrongly classified that the residents would stay when the resident indicated that they would not;
- False Negative (FN): (Type 2 Error) The model wrongly classified that the residents would not stay when the resident indicated that they would;
- The True Positives and True Negatives for the model should be inferred as follows:
- True Positive (TP): The model correctly classified that the residents would stay when the resident indicated that they would;
- True Negative (TN): The model correctly classified that the residents would not stay when the resident indicated that they would not.
2.4. SHAP Model
3. Results and Discussions
3.1. Descriptive Statistics
3.2. Random Forest Classifier Model Predictions
3.2.1. F1 Score and ROC Curve
3.2.2. Liveability Confusion Matrix
3.2.3. Random Forest
3.3. Feature Importance
3.4. Shapley Values and Their Analysis
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Myers, D. Building Knowledge about Quality of Life for Urban Planning. J. Am. Plan. Assoc. 1988, 54, 347–358. [Google Scholar] [CrossRef]
- Balsas, C.J. Measuring the livability of an urban centre: An exploratory study of key performance indicators. Plan. Pr. Res. 2004, 19, 101–110. [Google Scholar] [CrossRef]
- Tyce, H.; Rebecca, L. What is livability? Research initiative 2015–2017: Framing livability. Sustainable Cities Initiative, University of Oregon. 2015. Available online: https://sci.uoregon.edu/sites/sci1.uoregon.edu/files/sub_1_-_what_is_livability_lit_review.pdf (accessed on 20 October 2018).
- Philips Index for Health and Well-Being: A Global Perspective. Report by the Philips Center for Health and Well-Being. 2010. Available online: http://www.newscenter.philips.com/pwc_nc/main/standard/resources/corporate/press/2010/Global%20Index%20Results/20101111%20Global%20Index%20Report.pdf (accessed on 17 September 2018).
- Competitive Cities and Climate Change; Organization for Economic Cooperation and Development (OECD): Milan, Italy, 2008; Available online: https://www.oecd.org/cfe/regionaldevelopment/50594939.pdf (accessed on 14 April 2019).
- UN Habitat. City Prosperity Initiative. 2020. Available online: https://unhabitat.org/programme/city-prosperity-initiative (accessed on 17 September 2021).
- Gandelman, N.; Piani, G.; Ferre, Z. Neighborhood Determinants of Quality of Life. J. Happiness Stud. 2011, 13, 547–563. [Google Scholar] [CrossRef]
- PPS (Project for Public Space). How To Turn a Place Around: A Handbook of Creating Successful Public Spaces; PPS: New York, NY, USA, 2000. [Google Scholar]
- Winkelmann, L.; Winkelmann, R. Why Are the Unemployed So Unhappy? Evidence from Panel Data. Economica 1998, 65, 1–15. [Google Scholar] [CrossRef] [Green Version]
- Bruno, F.; Simon, L.; Alois, S. Valuing Public Goods: The Life Satisfaction Approach; Working Paper 1158; Center for Economic Studies and ifo Institute (CESifo): Munich, Germany, 2004. [Google Scholar]
- Kim, B.; Yoo, M.; Park, K.C.; Lee, K.R.; Kim, J.H. A value of civic voices for smart city: A big data analysis of civic queries posed by Seoul citizens. Cities 2020, 108, 102941. [Google Scholar] [CrossRef]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Why Should I Trust You? Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 532. [Google Scholar] [CrossRef] [Green Version]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar] [CrossRef]
- Kaal, H. A conceptual history of livability. City 2011, 15, 532–547. [Google Scholar] [CrossRef] [Green Version]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer: New York, NY, USA, 2013; p. 426. ISBN 978-1-4614-7137-0. [Google Scholar] [CrossRef]
- Dietterich, T.G. Ensemble Methods in Machine Learning. In Proceedings of the International Workshop on Multiple Classifier Systems, Cagliari, Italy, 9–11 June 2000; pp. 1–15. [Google Scholar] [CrossRef] [Green Version]
- Lundberg, S.M.; Erion, G.G.; Lee, S.I. Consistent individualized feature attribution for tree ensembles. arXiv 2018. [Google Scholar] [CrossRef]
- Claesen, M.; Moor, B.D. Hyperparameter search in machine learning. arXiv 2015. [Google Scholar] [CrossRef]
- Fawcett, T. An Introduction to ROC analysis. Pattern Recogn. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef] [PubMed]
- Excoffier, J.-B.; Salaün-Penquer, N.; Ortala, M.; Raphaël-Rousseau, M.; Chouaid, C.; Jung, C. Analysis of COVID-19 inpatients in France during first lockdown of 2020 using explainability methods. Med. Biol. Eng. Comput. 2022, 60, 1647–1658. [Google Scholar] [CrossRef] [PubMed]
- Štrumbelj, E.; Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 2014, 41, 647–665. [Google Scholar] [CrossRef]
Assumed Factors from Literature Reviews | Questionnaire Framework | Derived Features |
---|---|---|
Demographic identifier |
| Spatial Feature |
Transportation Facilities |
| Perception Feature |
Accessibility |
| Perception Feature |
Neighborhood |
| Perception Feature |
Ecology and Environment |
| Perception Feature |
Civic and Social Engagement |
| Perception Feature |
Opportunity |
| Perception Feature |
Likelihood to stay in their neighborhood |
| Liveability Metric |
Machine Learning Model | AUC | ACCURACY | SENSITIVITY | SPECIFICITY | F1-Measure |
---|---|---|---|---|---|
Random Forest | 70 | 80.5 | 92.6 | 71.4 | 89.41 |
SVM | 68.5 | 78.6 | 80 | 65 | 80.52 |
Decision Tree | 65.6 | 77.6 | 88.3 | 70.5 | 84.7 |
Naïve Bayesian | 69.5 | 70.52 | 77.84 | 75 | 79.5 |
Features | Description | |
---|---|---|
Positive Responses (Approx. %) (Only Ratings 5 and 4) | Other Responses | |
Residents who use public transportation | 28% | rest all used private modes of transportation |
Residents who waited for a long time in traffic | 14% | 52% of the residents never waited in traffic. |
Residents who could find an ample open space/green space for exercise/walking/jogging/cycling in their neighborhood | 44% | |
Residents who could feel that there is the availability of good quality drinking water and air | 73% | |
Residents who had access to a grocery store or a market | 82% | |
Residents who had access to health care services | 73% | |
Residents who had access to the internet | 88% | |
Residents who said that they have experienced perfect socioeconomic equality | 26% | 26% claim to experience high socioeconomic inequality |
Residents who rated the availability of cultural, arts, sports, or entertainment institutions in their city | 90% | |
Residents who rated that they had a good or great educational opportunity in their location | 60% | only 5% were lacking good educational opportunities |
Residents who rated the employment opportunities in their location | 80% of them felt it was not very high | |
Residents who rated their access to farm products | 30% | |
Residents who rated the safety of their neighborhood | 54% | Only 7% of the respondents felt that their neighborhood is not safe |
Residents who were happy with their neighborhood | 94% | |
Residents who wanted to continue living in their location (Liveability metric) | 65% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sujatha, V.; Lavanya, G.; Prakash, R. Quantifying Liveability Using Survey Analysis and Machine Learning Model. Sustainability 2023, 15, 1633. https://doi.org/10.3390/su15021633
Sujatha V, Lavanya G, Prakash R. Quantifying Liveability Using Survey Analysis and Machine Learning Model. Sustainability. 2023; 15(2):1633. https://doi.org/10.3390/su15021633
Chicago/Turabian StyleSujatha, Vijayaraghavan, Ganesan Lavanya, and Ramaiah Prakash. 2023. "Quantifying Liveability Using Survey Analysis and Machine Learning Model" Sustainability 15, no. 2: 1633. https://doi.org/10.3390/su15021633
APA StyleSujatha, V., Lavanya, G., & Prakash, R. (2023). Quantifying Liveability Using Survey Analysis and Machine Learning Model. Sustainability, 15(2), 1633. https://doi.org/10.3390/su15021633