Improving Data Utility in Privacy-Preserving Location Data Collection via Adaptive Grid Partitioning
Abstract
:1. Introduction
- First, we introduce a method to effectively compute the distribution of users during DP-based location data collection. This approach is able to effectively capture user distribution in real-time and adapt to dynamic changes in user behavior.
- Then, we propose a method to adaptively adjust the grid to maximize the utility of the collected location data under DP. This adaptive grid adjustment is designed to improve the granularity and relevance of the data, ensuring that the most significant and densely populated areas are prioritized, thus improving the overall quality and applicability of the data.
- We evaluated the performance of the proposed algorithms using real-world datasets. The evaluation results demonstrated that the proposed scheme significantly enhances the utility of the collected location data compared to existing methods
2. Related Work
3. Background and Problem Statement
3.1. Background
3.2. Problem Statement
4. Proposed Method
- Collection of perturbed location data from sampled users: The server first computes the obfuscation matrix, M, over uniformly partitioned grids and distributes it to the sampled users, who then send their perturbed locations back to the server.
- Estimation of the distribution of users: The server estimates the distribution of users based on the perturbed location data collected from the sampled users.
- Computation of adaptively partitioned grids: The server uses the estimated distribution of users to compute adaptively partitioned grids.
- Collection of perturbed location data from remaining users using adaptively partitioned grids: A new obfuscation matrix is computed using the adaptively partitioned grids. This new obfuscation matrix is then used to collect location data from the remaining users.
4.1. Collecting Perturbed Location Data from Sampled Users
4.2. Estimating Probability Distribution
- Initialization: The parameter (i.e., prior probability) is initialized as follows:
- E-step: The posterior probability is calculated based on the current parameters as follows:
- M-step: The parameter is updated using the current posterior probabilities calculated in the previous E-step:Here, represents the number of data in . After updating the prior probabilities, we perform a normalization step to ensure that the sum of all prior probabilities equals 1 as follows:
4.3. Computing Adaptively Partitioned Grids
Algorithm 1: Pseudo-code for adaptive grid partition |
4.4. Collecting Perturbed Location Data from Remaining Users Using Adaptively Partitioned Grids
5. Experiments
5.1. Experimental Setup
- Data-level metric measures the similarity between the true location dataset and the perturbed location dataset collected under -Geo-Ind. For the data-level evaluation, we use both the average count error (ACE) and the density error. The average count error quantifies the difference between the actual number of users, , and the number derived from the perturbed dataset, , for each grid. It is calculated asThe density error measures the difference between the actual density distribution of users and the perturbed version computed from the datasets collected under -Geo-Ind. This error is measured asHere, represents the Jenson–Shannon divergence between two distributions from the original location dataset, and from the perturbed location dataset, .
- Application-level metric evaluates the utility of collected data from the perspective of applications that use it. We use range query error for this metric, a widely recognized measure for evaluating the effectiveness of location data [14]. In the experiment, we generate a range query, , with a random region R, and compare the number of results from the original location dataset, , with those from the perturbed location datasets, ). It is calculated asIn the experiments, we generated 200 range queries and reported the average range query error.
5.2. Results
5.2.1. Data-Level Evaluation
5.2.2. Application-Level Evaluation
5.2.3. Evaluation of Network Variability and Grid Adaptation Effects
6. Conclusions and Future Work
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
DP | Differential Privacy |
LDP | Local Differential Privacy |
MDP | Metric Differential Privacy |
Geo-Ind | Geo-Indistinguishability |
MCS | Mobile Crowdsensing |
EM | Expectation-Maximization |
JSD | Jenson–Shannon Divergence |
References
- Wang, X.; Ma, Y.; Wang, Y.; Jin, W.; Wang, X.; Tang, J.; Jia, C.; Yu, J. Traffic flow prediction via spatial temporal graph neural network. In Proceedings of the Web Conference, Taipei, Taiwan, 20–24 April 2020; pp. 1082–1092. [Google Scholar]
- Pan, Z.; Liang, Y.; Wang, W.; Yu, Y.; Zheng, Y.; Zhang, J. Urban traffic prediction from spatio-temporal data using deep meta learning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 April 2019; pp. 1720–1730. [Google Scholar]
- Kim, J.S.; Kim, J.W.; Chung, Y.D. Successive Point-of-Interest Recommendation With Local Differential Privacy. IEEE Access 2021, 9, 66371–66386. [Google Scholar] [CrossRef]
- An, J.; Li, G.; Jiang, W. NRDL: Decentralized user preference learning for privacy-preserving next POI recommendation. Expert Syst. Appl. 2024, 239, 122421. [Google Scholar] [CrossRef]
- Primault, V.; Boutet, A.; Mokhtar, S.B.; Brunie, L. The long road to computational location privacy: A survey. IEEE Commun. Surv. Tutor. 2019, 21, 2772–2793. [Google Scholar] [CrossRef]
- Liu, B.; Zhou, W.; Zhu, T.; Gao, L.; Xiang, Y. Location privacy and its applications: A systematic study. IEEE Access 2019, 6, 17606–17624. [Google Scholar] [CrossRef]
- Alharthi, R.; Banihani, A.; Alzahrani, A.; Alshehri, A.; Alshahrani, H.; Fu, H.; Liu, A.; Zhu, Y. Location privacy challenges in spatial crowdsourcing. In Proceedings of the IEEE International Conference on Electro/Information Technology, Rochester, MI, USA, 3–5 May 2018. [Google Scholar]
- Henriksen-Bulmer, J.; Jeary, S. Re-identification attacks—A systematic literature review. Int. J. Inf. Manag. 2016, 36, 1184–1192. [Google Scholar] [CrossRef]
- Dwork, C. Differential privacy. In Proceedings of the International Colloquium on Automata, Languages, and Programming, Venice, Italy, 10–14 July 2006; pp. 1–12. [Google Scholar]
- Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third conference on Theory of Cryptography, New York, NY, USA, 4–7 March 2006. [Google Scholar]
- Bordenabe, N.E.; Chatzikokolakis, K.; Palamidess, C. Optimal geo-indistinguishable mechanisms for location privacy. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, 3–7 November 2014; pp. 251–262. [Google Scholar]
- Kim, J.; Jang, B. Workload-aware indoor positioning data collection via local differential privacy. IEEE Commun. Lett. 2019, 23, 1352–1356. [Google Scholar] [CrossRef]
- Zhang, P.; Cheng, X.; Su, S.; Wang, N. Area coverage-based worker recruitment under geo-indistinguishability. Comput. Netw. 2022, 217, 109340. [Google Scholar] [CrossRef]
- Du, Y.; Hu, Y.; Zhang, Z.; Fang, Z.; Chen, L.; Zheng, B.; Gao, Y. LDPTrace: Locally differentially private trajectory synthesis. In Proceedings of the VLDB Endowment, Vancouver, BC, Canada, 28 August–1 September 2023; pp. 1897–1909. [Google Scholar]
- Ghaemi, Z.; Farnaghi, M. A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data. ISPRS Int. J. Geo-Inf. 2019, 8, 82. [Google Scholar] [CrossRef]
- Alvim, M.; Chatzikokolakis, K.; Palamidessi, C.; Pazii, A. Local Differential Privacy on Metric Spaces: Optimizing the Trade-Off with Utility. In Proceedings of the IEEE Computer Security Foundations Symposium, Oxford, UK, 9–12 July 2018. [Google Scholar]
- Andres, M.E.; Bordenabe, N.E.; Chatzikokolakis, K.; Palamidessi, C. Geo-indistinguishability: Differential privacy for location-based systems. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Berlin, Germany, 4–8 November 2013; pp. 901–914. [Google Scholar]
- Chatzikokolakis, K.; Palamidessi, C.; Stronati, M. Geo-indistinguishability: A principled approach to location privacy. In Proceedings of the International Conference on Distributed Computing and Internet Technology, Bhubaneswar, India, 5–8 February 2015; pp. 49–72. [Google Scholar]
- Wang, L.; Yang, D.; Han, X.; Wang, T.; Zhang, D.; Ma, X. Location privacy-preserving task allocation for mobile crowdsensing with differential geo-obfuscation. In Proceedings of the International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 627–636. [Google Scholar]
- Qiu, C.; Squicciarini, A.C. Location privacy protection in vehicle-based spatial crowdsourcing via geo-indistinguishability. In Proceedings of the IEEE International Conference on Distributed Computing Systems, Dallas, TX, USA, 7–10 July 2019; pp. 1061–1071. [Google Scholar]
- Jin, W.; Xiao, M.; Guo, L.; Yang, L.; Li, M. ULPT: A user-centric location privacy trading framework for mobile crowd sensing. IEEE Trans. Mob. Comput. 2022, 21, 3789–3806. [Google Scholar] [CrossRef]
- Huang, P.; Zhang, X.; Guo, L.; Li, M. Incentivizing crowdsensing-based noise monitoring with differentially-private locations. IEEE Trans. Mob. Comput. 2021, 20, 519–532. [Google Scholar] [CrossRef]
- Zhao, Y.; Yuan, D.; Du, J.T.; Chen, J. Geo-Ellipse-Indistinguishability: Community-aware location privacy protection for directional distribution. IEEE Trans. Knowl. Data Eng. 2023, 35, 6957–6967. [Google Scholar] [CrossRef]
- Yu, L.; Zhang, S.; Meng, Y.; Du, S.; Chen, Y.; Ren, Y.; Zhu, H. Privacy-preserving location-based advertising via longitudinal geo-indistinguishability. IEEE Trans. Mob. Comput. 2024, 23, 8256–8273. [Google Scholar] [CrossRef]
- Zhao, Y.; Chen, J. Vector-indistinguishability: Location dependency based privacy protection for successive location data. IEEE Trans. Comput. 2024, 73, 970–979. [Google Scholar] [CrossRef]
- Mendes, R.; Cunha, M.; Vilela, J.P. Velocity-aware geo-indistinguishability. In Proceedings of the ACM Conference on Data and Application Security and Privacy, Charlotte, NC, USA, 24–26 April 2023; pp. 141–152. [Google Scholar]
- Ren, W.; Tang, S. EGeoIndis: An effective and efficient location privacy protection framework in traffic density detection. Veh. Commun. 2020, 21, 100187. [Google Scholar] [CrossRef]
- Kim, J.W.; Lim, B. Effective and Privacy-Preserving Estimation of the Density Distribution of LBS Users under Geo-Indistinguishability. Electronics 2023, 12, 917. [Google Scholar] [CrossRef]
- Chen, R.; Li, L.; Ma, Y.; Gong, Y.; Guo, Y.; Ohtsuki, T.; Pan, M. Constructing Mobile Crowdsourced COVID-19 Vulnerability Map With Geo-Indistinguishability. IEEE Internet Things J. 2022, 9, 17403–17416. [Google Scholar] [CrossRef]
- Fathalizadeh, A.; Moghtadaiee, V.; Alishahi, M. Indoor Geo-Indistinguishability: Adopting Differential Privacy for Indoor Location Data Protection. IEEE Trans. Emerg. Top. Comput. 2023, 12, 293–306. [Google Scholar] [CrossRef]
- Feyisetan, O.; Balle, B.; Drake, T.; Diethe, T. Privacy-and utility-preserving textual analysis via calibrated multivariate perturbations. In Proceedings of the International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 178–186. [Google Scholar]
- Song, S.; Kim, J.W. Adapting Geo-Indistinguishability for Privacy-Preserving Collection of Medical Microdata. Electronics 2023, 12, 2793. [Google Scholar] [CrossRef]
- Ahuja, R.; Ghinita, G.; Shahabi, C. A utility-preserving and scalable technique for protecting location data with geo-indistinguishability. In Proceedings of the International Conference on Extending Database Technology, Lisbon, Portuga, 26–29 March 2019; pp. 210–231. [Google Scholar]
- Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational Inference: A Review for Statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef]
- Chib, S. Markov Chain Monte Carlo Methods: Computation and Inference. Handb. Econom. 2001, 5, 3569–3649. [Google Scholar]
- Li, Y.; Hernandez-Lobato, J.M.; Turner, R.E. Stochastic Expectation Propagation. arXiv 2018, arXiv:1506.04132. [Google Scholar]
- Sammaknejad, N.; Zhao, Y.; Huang, B. A review of the Expectation Maximization algorithm in data-driven process identification. J. Process Control 2019, 73, 123–136. [Google Scholar] [CrossRef]
- Howell, C.R.; Su, W.; Nassel, A.F.; Agne, A.A.; Cherrington, A.L. Area based stratified random sampling using geospatial technology in a community-based survey. BMC Public Health 2020, 20, 1678. [Google Scholar] [CrossRef] [PubMed]
- Moreira-Matias, L.; Gama, J.; Ferreira, M.; Mendes-Moreira, J.; Damas, L. Predicting taxi–passenger demand using streaming data. IEEE Trans. Intell. Transp. Syst. 2013, 14, 1393–1402. [Google Scholar] [CrossRef]
- Apache Hadoop. Available online: https://hadoop.apache.org/ (accessed on 22 July 2024).
- Apache Spark—Unified Engine for Large-Scale Data Analytics. Available online: https://spark.apache.org/ (accessed on 22 July 2024).
AG | NG | |||||
Loss Rate of Sampled Location Data | 0% | 1% | 5% | 10% | 20% | |
Average Count Error | 4.705 | 4.867 | 4.896 | 4.936 | 5.018 | 9.842 |
Density Error | 0.319 | 0.321 | 0.328 | 0.336 | 0.341 | 0.440 |
Grid size | |||||
Execution time (s) | 4.10 | 17.41 | 43.12 | 139.56 | 439.56 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, J. Improving Data Utility in Privacy-Preserving Location Data Collection via Adaptive Grid Partitioning. Electronics 2024, 13, 3073. https://doi.org/10.3390/electronics13153073
Kim J. Improving Data Utility in Privacy-Preserving Location Data Collection via Adaptive Grid Partitioning. Electronics. 2024; 13(15):3073. https://doi.org/10.3390/electronics13153073
Chicago/Turabian StyleKim, Jongwook. 2024. "Improving Data Utility in Privacy-Preserving Location Data Collection via Adaptive Grid Partitioning" Electronics 13, no. 15: 3073. https://doi.org/10.3390/electronics13153073
APA StyleKim, J. (2024). Improving Data Utility in Privacy-Preserving Location Data Collection via Adaptive Grid Partitioning. Electronics, 13(15), 3073. https://doi.org/10.3390/electronics13153073