Topic-Clustering Model with Temporal Distribution for Public Opinion Topic Analysis of Geospatial Social Media Data
Abstract
:1. Introduction
2. Related Works
3. Topic-Clustering Model with Temporal Distribution
3.1. DMMOT Topic Model
- I.
- Select topic z of document d according to polynomial distribution of the document-topic;
- II.
- Select the word distribution under the zth topic according to polynomial distribution of the topic-word. Then, Nd words of the document are obtained by sampling Nd times under the probability distribution;
- III.
- According to the time attribute of documents, the value of probability that one document belongs to a different topic is obtained from the probability distribution of “topic-time,” which is part of the Gibbs sampling probability.
3.1.1. Inference Process of the DMMOT Model by Gibbs Sampling
3.1.2. Gibbs Sampling Algorithm for the DMMOT Model
Algorithm 1: The Gibbs Sampling algorithm for DMMOT model | |
Input: the number of topics K, the number of document D and number of iterations iter | |
Output: the topic classification labels for all documents | |
1 | set up hyper-parameters of Dirichlet distribution , let equal to 0 respectively; |
2 | for document do |
3 | random sampling for document d’s topic classification label ; |
4 | insert into ; |
5 | ; ; |
6 | for word do |
7 | ; |
8 | end for |
9 | end for |
10 | calculate the Mean and Variance of timestamps under every topics; |
11 | employ the method of Moments to estimate the parameter of Beta distribution; |
12 | for iteration do |
13 | for document do |
14 | record the topic classification label and timestamp of this document d; |
15 | ; ; |
16 | for word do |
17 | ; |
18 | end for |
19 | sample a new topic label for document d based on the deduced Gibbs Sampling equation; |
20 | insert into ; |
21 | ; ; |
22 | for word do |
23 | ; |
24 | end for |
25 | end for |
26 | utilize the Mean and Variance of timestamps to estimate Beta distribution parameters for different topics; |
27 | end for |
4. Case Study and Discussion
4.1. Comparison Results for the DMM and LDA Topic Models
4.2. Results of Comparing the TOT and LDA Topic Models
4.3. Spatiotemporal Analysis and Discussion on the Mining Results for the DMMOT Model
4.3.1. Trends of Public Opinion Topics over Time
4.3.2. Spatial Distribution of Public Opinion Topics
5. Conclusions
- (1)
- From the perspective of the model-generation process, the assumption of DMMOT model about the document’s topic made it possible to obtain a document’s topic directly from the assigned results of Gibbs sampling. Furthermore, we could get the fitted topic-time distribution and combine the spatial information with the topic for spatiotemporal analysis of public opinion topics.
- (2)
- The proposed DMMOT model performed better than the LDA, DMM, and TOT models for public opinion topic mining based on microblog data. The mining results indicated that topic-word distribution among different topics generated by the DMMOT model is differentiated, and the topic-word distribution within various topics is semantically aggregated. Meanwhile, the microblog text under each topic was gathered in a certain time window because of the topic-time distribution in the model assumption.
- (3)
- From the perspective of the temporal and spatial distribution of public opinion topics, the topic-time distribution, obtained by the DMMOT model, generated topics that were relatively concentrated in the time window, and the characteristics of the trends of various topics over time were basically consistent with the corresponding topic content. Spatial distributions of all topics were concentrated in residential areas, and detailed distribution of the hotspots was related to the summaries of topics. Further, spatial distribution of different public opinion topics can help identify hotspots of public opinion distribution, perform differentiated public opinion management, and guide public opinion accurately.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Liu, Y. Revisiting several basic geographical concepts: A social sensing perspective. Acta Geogr. Sin. 2016, 71, 564–575. [Google Scholar]
- Yang, W.; Mu, L.; Shen, Y. Effect of climate and seasonality on depressed mood among twitter users. Appl. Geogr. 2015, 63, 184–191. [Google Scholar] [CrossRef]
- Bird, D.K.; Haynes, K.; van den Honert, R.; McAneney, J.; Poortinga, W. Nuclear power in Australia: A comparative analysis of public opinion regarding climate change and the Fukushima disaster. Energy Policy 2014, 65, 644–653. [Google Scholar] [CrossRef] [Green Version]
- Shibuya, Y.; Tanaka, H. Public sentiment and demand for used cars after a large-scale disaster: Social media sentiment analysis with facebook pages. arXiv 2018, arXiv:1801.07004. [Google Scholar]
- Karami, A.; Shah, V.; Vaezi, R.; Bansal, A. Twitter speaks: A case of national disaster situational awareness. J. Inf. Sci. 2020, 46, 313–324. [Google Scholar] [CrossRef] [Green Version]
- El Barachi, M.; AlKhatib, M.; Mathew, S.; Oroumchian, F. A Novel sentiment analysis framework for monitoring the evolving public opinion in real-time: Case study on climate change. J. Clean. Prod. 2021, 312, 127820. [Google Scholar] [CrossRef]
- Belcastro, L.; Cantini, R.; Marozzo, F. Knowledge discovery from large amounts of social media data. Appl. Sci. 2022, 12, 1209. [Google Scholar] [CrossRef]
- Jiang, Y.; Liang, R.; Zhang, J.; Sun, J.; Liu, Y.; Qian, Y. Network public opinion detection during the coronavirus pandemic: A short-text relational topic model. ACM Trans. Knowl. Discov. Data 2022, 16, 52. [Google Scholar] [CrossRef]
- Sina Weibo Data Center. Weibo User Development Report in 2020. Available online: https://data.weibo.com/report/reportDetail?id=456 (accessed on 16 March 2021). (In Chinese).
- Ye, X.; Li, S.; Yang, X.; Qin, C. Use of social media for the detection and analysis of infectious diseases in China. ISPRS Int. J. Geo-Inf. 2016, 5, 156. [Google Scholar] [CrossRef] [Green Version]
- Yin, J.; Wang, J. A Dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014. [Google Scholar]
- Wang, X.; McCallum, A. Topics over time: A non-markov continuous-time model of topicassl trends. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006. [Google Scholar]
- Hofmann, T. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, 15–19 August 1999. [Google Scholar]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Blei, D.M.; Lafferty, J.D. Correlated Topic Models. In Proceedings of the 19th Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 5–8 December 2005. [Google Scholar]
- Li, W.; Sun, L.; Zhang, D. Text classification based on labeled-LDA model. Chin. J. Comput. 2008, 31, 620–627. [Google Scholar] [CrossRef]
- Yan, X.; Guo, J.; Lan, Y.; Cheng, X. A biterm topic model for short texts. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013. [Google Scholar]
- Ma, T.; Li, J.; Liang, X.; Tian, Y.; Al-Dhelaan, A.; Al-Dhelaan, M. A time-series based aggregation scheme for topic detection in Weibo short texts. Phys. A Stat. Mech. Its Appl. 2019, 536, 120972. [Google Scholar] [CrossRef]
- Walde, S.S.I.; Melinger, A. An in-depth look into the co-occurrence distribution of semantic associates. Ital. J. Linguist. 2008, 20, 89–128. [Google Scholar]
- Li, C.; Duan, Y.; Wang, H.; Zhang, Z.; Sun, A.; Ma, Z. Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Trans. Inf. Syst. 2017, 36, 1–30. [Google Scholar] [CrossRef]
- Rahimi, M.; Zahedi, M.; Mashayekhi, H. A probabilistic topic model based on short distance co-occurrences. Expert Syst. Appl. 2022, 193, 116518. [Google Scholar] [CrossRef]
- Blei, D.M.; Lafferty, J.D. Dynamic Topic Models. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006. [Google Scholar]
- Han, X.; Wang, J.; Zhang, M.; Wang, X. Using social media to mine and analyze public opinion related to COVID-19 in China. Int. J. Environ. Res. Public Health 2020, 17, 2788. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, J.; Zhang, M.; Han, X.; Wang, X.; Zheng, L. Spatio-temporal evolution and regional differences of the public opinion on the prevention and control of COVID-19 epidemic in China. Acta Geogr. Sin. 2020, 75, 2490–2504. [Google Scholar]
- Boon-Itt, S.; Skunkan, Y. Public perception of the COVID-19 pandemic on Twitter: Sentiment analysis and topic modeling study. JMIR Public Health Surveill. 2020, 6, 245–261. [Google Scholar] [CrossRef]
- Amara, A.; Hadj Taieb, M.A.; Ben Aouicha, M. Multilingual topic modeling for tracking COVID-19 trends based on Facebook data analysis. Appl. Intell. 2021, 51, 3052–3073. [Google Scholar] [CrossRef] [PubMed]
- Hu, Y.; Huang, H.; Chen, A.; Mao, X.L. Weibo-COV: A large-Scale COVID-19 social media mataset from Weibo. arXiv 2020, arXiv:2005.09174. [Google Scholar]
- Hu, Y.; Huang, H.; Chen, A.; Mao, X.L. Weibo-COV 2.0. 2020. Available online: https://github.Com/nghuyong/weibo-cov (accessed on 24 June 2020).
Parameter | Meaning of Parameters |
---|---|
Number of documents belonging to topic k | |
Number of documents in the corpus | |
Topic number | |
Number of words in the corpus | |
Hyperparameter of Dirichlet prior distribution ofthe “document-topic” and “topic-word” | |
Number of times word w appears in document d | |
Total number of words in document d | |
Number of times of word w belongs to topic k | |
Total number of words belonging to topic k | |
Time attribute value for document d | |
Two parameters of the beta distribution for document |
Topic ID | Words Describing the Topic of the DMM Model | Words Describing the Topics of the LDA Model |
---|---|---|
0 | New, case, confirm, death, cumulative, COVID-19, number, pneumonia, suspected case, and discharge | Work resumption, new, case, confirm, firm, lift the lockdown, prevent and control, work, on duty, proof, enterprise, recover, staff, cumulative, and inform |
1 | Work resumption, firm, enterprise, work, influence, life, consumption, economy, express delivery, and situation | |
2 | Virus, China, COVID-19, pneumonia, America, infect, country, patient, coronavirus, and global | China, COVID-19, virus, country, America, global, pneumonia, world, economy, and influence |
3 | Lockdown, city, life, safe, early, spring, day, work resumption, Sakura, and expect | Life, safe, day, early, health, lockdown, hurry up, mood, at home, and expect |
4 | Neighborhood, community, estate, volunteer, resident, groupon, quarantine, confirm, supply, and go out | Neighborhood, community, volunteer, resident, estate, groupon, supply, staff, worker, and proprietor |
5 | Mask, go out, neighborhood, work resumption, supermarket, on duty, at home, go home, disinfect, and lockdown | Mask, go out, neighborhood, supermarket, disinfect, protection, at home, on the road, alcohol, and downstairs |
6 | Hospital, patient, protective clothing, mask, work, medical workers, frontline, supply, appreciate, and support | Hospital, patient, quarantine, test, nucleic acid, doctor, community, therapy, examine, CT, discharge, infect, fever, situation, and heat |
7 | Hospital, patient, quarantine, community, confirm, nucleic acid, pneumonia, doctor, test, and infect | |
8 | Work resumption, test, nucleic acid, quarantine, neighborhood, prevent and control, staff, community, lift the lockdown, and proof | |
9 | Mom, dad, hospital, at home, quarantine, worry, go home, go out, child, and infect | Mom, dad, on duty, protective clothing, child, go home, work, go back, at home, and husband |
10 | Hero, China, anti-epidemic, people, appreciate, salute, frontline, national, pneumonia, and fight | Frontline, appreciate, medical workers, anti-epidemic, people, hero, China, support, national, and fight |
Topic ID | Words Describing the Topics of the TOT Model | Popularity of Each Topic over Time |
---|---|---|
0 | Community, neighborhood, patient, quarantine, hospital, life, work, lockdown, COVID-19, and virus | |
2 | Work resumption, mask, test, COVID-19, nucleic acid, neighborhood, China, life, confirm, and go out | |
5 | Mask, pneumonia, go out, unknown, reason, vaccine, coronavirus, virus, patient, and novel | |
6 | Patient, hospital, community, neighborhood, life, quarantine, work, appreciate, COVID-19, and lockdown | |
7 | Hospital, quarantine, mask, patient, go out, confirm, infect, pneumonia, at home, and community | |
8 | Work resumption, neighborhood, virus, mask, COVID-19, quarantine, China, hospital, go out, and life | |
10 | Mask, work resumption, COVID-19, test, case, nucleic acid, quarantine, China, life, and confirm | |
13 | Work resumption, lift the lockdown, neighborhood, mask, hero, test, COVID-19, nucleic acid, go out, and life |
Topic ID | Words Describing the Topics of the LDA Model | Popularity of Each Topic over Time |
---|---|---|
3 | Hospital, patient, quarantine, test, nucleic acid, doctor, community, therapy, examine, and CT | |
4 | Frontline, appreciate, medical workers, anti-epidemic, people, hero, China, support, national, and fight | |
5 | Lockdown, quarantine, friend, at home, celebrate the spring festival, family, city, message, people, and government | |
10 | Neighborhood, community, volunteer, resident, estate, groupon, supply, staff, worker, and proprietor | |
11 | Coronavirus, pneumonia, novel, virus, infect, diary, COVID-19, asymptomatic, infected person, and article | |
12 | Work resumption, new, case, confirm, firm, lift the lockdown, prevent and control, work, on duty, and proof |
Topic ID | Topic Summary | Words Describing the Topics of the DMMOT Model | Number of Documents |
---|---|---|---|
0 | Expect and Pray | Lockdown, city, life, spring, safe, early, day, sakura, expect, and work resumption | 6156 |
1 | Material Donations | Hospital, supply, mask, patient, donate, pneumonia, medical workers, frontline, protective clothing, and Huoshenshan | 999 |
2 | Infection and Patients | Hospital, patient, quarantine, community, confirm, nucleic acid, mom, doctor, infect, and pneumonia | 1583 |
3 | Work and Family | Work resumption, mask, mom, at home, go out, life, lockdown, on duty, work, and dad | 9922 |
4 | Global Pandemic | China, country, virus, COVID-19, people, world, life, global, quarantine, and pneumonia | 4116 |
5 | Care for Family and Friends | Mask, go out, lockdown, at home, hospital, quarantine, friend, family, safe, and pneumonia | 5521 |
6 | Virus Profile | COVID-19, virus, pneumonia, patient, infect, coronavirus, China, America, test, and novel | 1771 |
7 | Community Epidemic | Community, neighborhood, patient, quarantine, confirm, resident, new, case, citywide, and prevent and control | 299 |
8 | Policy of Work Resumption | work resumption, test, neighborhood, nucleic acid, mask, go out, lift the lockdown, on duty, quarantine, and firm | 4316 |
9 | Appreciation and Salutation | Appreciate, anti-epidemic, hero, people, frontline, China, salute, hospital, medical workers, and national | 3361 |
10 | Epidemic Report | New, case, confirm, death, cumulative, suspected case, number, suspect, data, and discharge | 784 |
11 | Medical Work | Patient, hospital, protective clothing, work, mask, quarantine, on duty, nurse, and frontline, and medical workers | 2730 |
12 | Community Supply | Neighborhood, community, mask, go out, supermarket, groupon, at home, volunteer, confirm, and estate | 3385 |
13 | Work Influence | Work resumption, enterprise, work, prevent and control, firm, influence, life, pneumonia, COVID-19, and situation | 1696 |
14 | Virus Mechanism | Virus, America, Trump, mechanism, rise, person to person, China, society, COVID-19, and at present | 135 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, C.; Liang, Q.; Luo, N.; Lu, S. Topic-Clustering Model with Temporal Distribution for Public Opinion Topic Analysis of Geospatial Social Media Data. ISPRS Int. J. Geo-Inf. 2023, 12, 274. https://doi.org/10.3390/ijgi12070274
Hu C, Liang Q, Luo N, Lu S. Topic-Clustering Model with Temporal Distribution for Public Opinion Topic Analysis of Geospatial Social Media Data. ISPRS International Journal of Geo-Information. 2023; 12(7):274. https://doi.org/10.3390/ijgi12070274
Chicago/Turabian StyleHu, Chunchun, Qin Liang, Nianxue Luo, and Shuixiang Lu. 2023. "Topic-Clustering Model with Temporal Distribution for Public Opinion Topic Analysis of Geospatial Social Media Data" ISPRS International Journal of Geo-Information 12, no. 7: 274. https://doi.org/10.3390/ijgi12070274
APA StyleHu, C., Liang, Q., Luo, N., & Lu, S. (2023). Topic-Clustering Model with Temporal Distribution for Public Opinion Topic Analysis of Geospatial Social Media Data. ISPRS International Journal of Geo-Information, 12(7), 274. https://doi.org/10.3390/ijgi12070274