UGC Knowledge Features and Their Influences on the Stock Market: An Empirical Study Based on Topic Modeling
Abstract
:1. Introduction
2. Related Studies
2.1. User-Generated Content, UGC
2.2. Topic Modeling Analysis
2.3. Knowledge Discovery and Knowledge Feature Measurement
3. Research Methods
3.1. UGC Data Collection and Natural Language Processing
3.2. LDA Topic Recognition and Visualization
3.3. Language Analysis and Empirical Verification
4. Empirical Research
4.1. The Data Source and Data Preprocessing
4.2. Topic Identification
4.3. Topic Extraction Results
4.4. Language Analysis Results
5. Stock Market Impact
5.1. Market Feedback on Risk Attributes in UGC Knowledge Features
5.2. Enlightenment
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Mannens, E.; Verwaest, M.; Van de Walle, R. Production and multi-channel distribution of news. Multimed. Syst. 2008, 14, 359–368. [Google Scholar] [CrossRef]
- Domingo, D.; Masip, P.; Meijer, I.C. Tracing digital news networks towards an integrated framework of the dynamics of news production, circulation and use. Digit. J. 2015, 3, 53–67. [Google Scholar]
- dos Santos, M.L.B. The “so-called” UGC: An updated definition of user-generated content in the age of social media. Online Inf. Rev. 2022, 46, 95–113. [Google Scholar] [CrossRef]
- Sun, R.; Hong, X.-J. Social Presence and User-Generated Content of Social Media in China. Int. J. Semant. Web Inf. Syst. 2019, 15, 35–47. [Google Scholar] [CrossRef]
- Wang, X.; Chen, X. The Impact of Graphic and Text Matching on Consumer Perceived Usefulness of User Generated Content. Manag. Sci. 2018, 31, 101–115. [Google Scholar]
- Hou, L.; Li, J.; Li, X.-L.; Tang, J.; Guo, X. Learning to Align Comments to News Topics. ACM Trans. Inf. Syst. 2017, 36, 1–30. [Google Scholar] [CrossRef]
- Tu, W.T.; Yang, M.; Cheung, D.W.; Mamoulis, N. Investment recommendation by discovering high-quality opinions in investor based social networks. Inf. Syst. 2018, 78, 189–198. [Google Scholar] [CrossRef]
- Wang, L.; Li, S.W.; Chen, T.Q. Investor behavior, information disclosure strategy and counterparty credit risk contagion. Chaos Solitons Fractals 2019, 119, 37–49. [Google Scholar] [CrossRef]
- Singh, R.; Srivastava, S. Stock prediction using deep learning. Multimed. Tools Appl. 2017, 76, 18569–18584. [Google Scholar] [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Zhang, Y.; Wei, H.; Ran, Y.; Deng, Y.; Liu, D. Drawing openness to experience from user generated contents: The An interpretable data—Driven topic modeling approach. J. Expert Syst. Appl. 2020, 144, 113073. [Google Scholar] [CrossRef]
- Prollochs, N.; Feuerriegel, S. Business analytics for strategic management: Identifying and assessing corporate challenges via topic modeling. Inf. Manag. 2020, 57, 103070. [Google Scholar] [CrossRef]
- Nam, H.; Joshi, Y.V.; Kannan, P. Harvesting brand information from social tags. J. Mark. 2017, 81, 88–108. [Google Scholar] [CrossRef]
- Krishnamurthy, S.; Dou, W. Note from special issue editors. J. Interact. Advert. 2008, 8, 1–4. [Google Scholar] [CrossRef]
- Hofmann, T. Probabilistic Latent Semantic Indexing. In Proceedings of the Sigir’99: Proceedings of 22nd International Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, 15–19 August 1999. [Google Scholar]
- Peng, G.; Yuefen, W.; Zhu, F. Analysis of Topic Extraction Effect of Scientific Literature Based on LDA Topic Model in Different Corpus. Libr. Inf. Serv. 2016, 60, 112–121. [Google Scholar]
- Liu, Z.; Xu, H.; Yue, L.; Fang, S. Research on Core Technology Theme Recognition Method Based on Chunk-LDAVIS. Libr. Inf. Sci. 2019, 63, 73–84. [Google Scholar]
- Li, C.; Feng, S.; Zeng, Q.; Ni, W.; Zhao, H.; Duan, H. Mining dynamics of research topics based on the combined LDA and Wordnet. IEEE Access 2019, 7, 6386–6399. [Google Scholar] [CrossRef]
- Xu, Y.; Li, Y.; Liang, Y.; Cai, L. Topic-sentiment evolution over time: A manifold learning-based model for online news. J. Intell. Inf. Syst. 2020, 55, 27–49. [Google Scholar] [CrossRef]
- Rosen-Zvi, M.; Chemudugunta, C.; Griffiths, T.; Smyth, P.; Steyvers, M. Learning author-topic models from text corpora. ACM Trans. Inf. Syst. 2010, 28, 73–86. [Google Scholar] [CrossRef]
- Wang, H.; Wu, F.; Lu, W.; Yang, Y.; Li, X.; Li, X.; Zhuang, Y. Identifying objective and subjective words via topic modeling. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 718–730. [Google Scholar] [CrossRef]
- Fayyad, U.N. From Data Mining to Knowledge Discovery: On Overview. Adv. Knowl. Discov. Data Min. 1996, 1, 12. [Google Scholar]
- Walter, J.; Lechner, C.; Kellermanns, F.W. Knowledge transfer between and within alliance partners: Private versus collective benefits of social capital. J. Bus. Res. 2007, 60, 698–710. [Google Scholar] [CrossRef]
- Rennolls, K.; Society, I.C. An intelligent framework (O-SS-E) for data mining, knowledge discovery and business intelligence. In Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA’05), Copenhagen, Denmark, 22–26 August 2005; pp. 715–719. [Google Scholar]
- Cazzella, S.; Dragone, L. The Role of Domain Knowledge in KDD-Based Strategic Marketing Applications. In Proceedings of the 8th World Multi-Conference on Systemics, Cybernetics and Informatics, Orlando, FL, USA, 18–21 July 2004; pp. 381–386. [Google Scholar]
- Budanitsky, A.; Hirst, G. Evaluating WordNet-based measures of lexical semantic relatedness. Comput. Lin-Guistics 2006, 32, 13–47. [Google Scholar] [CrossRef]
- Miller, G.A. Wordnet—A lexical database for English. Commun. ACM 1995, 38, 39–41. [Google Scholar] [CrossRef]
- Leacock, C.; Miller, G.A.; Chodorow, M. Using corpus statistics and WordNet relations for sense identification. Comput.-Tional Linguist. 1998, 24, 147–165. [Google Scholar]
- Schmitt, X.; Kubler, S.; Robert, J.; Papadakis, M.; LeTraon, Y. A Replicable Comparison Study of NER Software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate. In Proceedings of the 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), Social Networks Analysis, Management and Security (SNAMS), Granada, Spain, 22–25 October 2019; pp. 338–343. [Google Scholar]
- Omran, F.A.; Treude, C. Choosing an NLP library for analyzing software documentation: A systematic literature review and a series of experiments. In Proceedings of the 14th International Conference on Mining Software Repositories, Buenos Aires, Argentina, 20–21 May 2017. [Google Scholar]
- Zhang, X.; Wen, Y.; Xu, H.; Liu, Z. Evolution of Prophet Prediction-Correction Topic Strength Model—An Empirical Study in Stem Cell Field. Libr. Inf. Serv. 2020, 64, 78–92. [Google Scholar]
- Arun, R.; Suresh, V.; Madhavan, C.V.; Narasimha Murthy, M.N. On finding the natural number of topics with latent Dirichlet allocation: Some observations. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Hyderabad, India, 21–24 June 2010; Volume 1, pp. 391–402. [Google Scholar]
- Mimno, D.; Wallach, H.M.; Talley, E.; McCallum, A. Optimizing semantic coherence in topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Scotland, UK, 27–31 July 2011; pp. 262–272. [Google Scholar]
- Zhang, H. Research on Technology Prediction Method from the Perspective of Data Fusion; Jilin University: Changchun, China, 2019. [Google Scholar]
- Hedlund, D.; Ahlund, A. Language has a home: How case officers make use of language analysis in asylum decisions. J. Ethn. Migr. Stud. 2020, 47, 1578–1595. [Google Scholar] [CrossRef]
- Zou, R.; Yu, J. Social network analysis of informal academic communication in digital age: A case study of the small wood vermin life science forum. Inf. Sci. 2015, 33, 81–86. [Google Scholar]
- Luss, R.; D’Aspremont, A. Predicting abnormal returns from news using text classification. Quant. Financ. 2015, 15, 999–1012. [Google Scholar] [CrossRef]
- Kauffman, R.J.; Spaulding, T.J.; Wood, C.A. Are online auction markets efficient? An empirical study of market liquidity and abnormal returns. Decis. Support Syst. 2009, 48, 3–13. [Google Scholar] [CrossRef]
- Ramirez, E.; Gau, R.; Hadjimarcou, J.; Xu, Z. User-generated content as word-of-mouth. J. Mark. Theory Pract. 2018, 26, 90–98. [Google Scholar] [CrossRef]
- Tirunillai, S.; Tellis, G.J. Does chatter really matter? Dynamics of user-generated content and stock performance. Mark. Sci. 2012, 31, 198–215. [Google Scholar] [CrossRef] [Green Version]
Topic Name | Topic% | Keyword Stems | Risk Assessment | High Frequency Company | |
---|---|---|---|---|---|
1 | Financial attractive | 5.8 | finance, attract, cheap, mine, exposure, bioscience | 0.004890 | Microsoft, Tencent |
2 | Therapeutics report | 4.5 | therapeutic, Boeing, drive, podcast, airline, Airbus | 0.002189 | Boeing |
3 | Earnings | 5 | earn, analyze, Amazon, promos, approve, valuat | 0.000044 | Amazon, McMoran |
4 | Undervalued company | 5.1 | undervalue, busy, develop, revenue, growth, lithium | 0.041475 | AstraZeneca |
5 | Tesla model | 4.7 | Tesla, China, electro, central, expect, model | 0.000049 | Tesla |
6 | Share price | 5.5 | price, share, target, catalyst, selloff, solar | 0.000660 | Starbucks |
7 | Trade | 5.3 | group, trade, Micron, solid, quarter, wrong | 0.065879 | Micron |
8 | Biotech-Pharma | 5 | pharma, posit, dividend, ready, Abbvie, biotech | 0.000046 | Abbvie, Netflix |
9 | Health industry | 4.4 | industry, pharmaceut, global, better, start, health | 0.008343 | Intel |
10 | China and America | 5.1 | update, future, money, resources, America, Chinese | 0.010947 | Progenics |
11 | Technology growth | 5.3 | potent, growth, technology, history, partner, remain | 0.019210 | Merck |
12 | Holding capital | 4.8 | upside, ahead, hold, copied, return, term | 0.005370 | Vertex |
13 | Investment opportunity | 6.8 | opportune, value, buy, offer, trade, point | 0.001507 | |
14 | Energy growth in China | 4 | energy, review, growth, reward, China, rais | 0.000059 | |
15 | International healthcare sales | 4.8 | invest, reason, sale, intern, Alibaba, science | 0.003980 | Alibaba, Gilead |
16 | Strong brand | 6 | strong, Facebook, result, brand, growth, Chespeake | 0.000035 | Facebook, Chespeake |
17 | Portfolio | 4.8 | portfolio, posit, growth, acquit, medic, become | 0.006085 | Pinduoduo |
18 | Forecast | 4.7 | continue, short, look, higher, product, pipeline | 0.000230 | |
19 | Biotech-Tech | 4.8 | Apple, biotech, great, thing, investor, grow | 0.000094 | Apple |
20 | Legal issue | 3.6 | momentum, Nvidia, profit, Sonos, weak, expands | 0.006453 | Nvidia, Sonos |
Topic Name | Topic% | Keyword Stems | Risk Assessment | |
---|---|---|---|---|
1 | Cancer immunotherapy | 4.0 | immune, cancer, kill, milestone, buyout, index, Canadian, ship, mutual, agreement | 0.012008 |
2 | Bitcoin investment | 9.5 | right, value, investor, shareholder, bitcoin, board, portfolio, market, current, crypto | 0.004822 |
3 | Video technology | 4.7 | video, high, technic, signal, Youtube, close, target, report, California, potenti | 0.001796 |
4 | Earnings & Dividends | 7.5 | earn, dividend, yield, growth, buy, estimate, expect, equity, bottom, winner | 0.007154 |
5 | Stock price movement | 15.5 | share, price, market, go, posit, short, trade, increase, rais, higher | 0.017160 |
6 | Technology announcement | 13.1 | busy, system, Saudi, technology, industry, problem, base, announces, potency, project | 0.019387 |
7 | Stock consultant analysis | 5.3 | analysis, stock consult, watches, support, breakout, bound, bullish, strong, rang, stat | 0.000011 |
8 | Oil stocks under natural environment | 6.3 | expect, price, bullish, crude, forecast, analyst, storm, product, bearish, hurricane | 0.019863 |
9 | Brand contribution | 3.1 | income, swingstocktrad, figure, update, user, really, brand, risk, Germany, selloff | 0.010922 |
10 | Financial derivatives | 5.5 | trade, shareplann, report, swing, strategi, call, option, video, finance, good | 0.002698 |
11 | Social media market opportunity | 5.5 | revenue, video, Facebook, margin, study, Twitter, anyone, sale, friend, guidance | 0.000025 |
12 | China-U.S. trade tariffs | 12.2 | China, heisenbergreport, Trump, market, trade, tariff, economy, interest, rate, break | 0.022065 |
13 | The algorithm-based stock trading pattern | 7.9 | level, trade, market, chart, profit, start, move, earn, growth, pattern | 0.009371 |
Dependent Variable | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
(AReti,t + 10) | (AReti,t + 5) | (AReti,t + 3) | (AReti,t + 2) | (AReti,t + 1) | ||||||
Coeff. | t | Coeff. | t | Coeff. | t | Coeff. | t | Coeff. | t | |
Constant | −0.005 ** | −3.31 | −0.005 *** | −2.61 | 0.002 | 0.88 | 0.001 | 0.36 | −0.005 ** | −2.34 |
UGCArticle_ki,t | −0.217 ** | −2.37 | −0.109 | −0.43 | −0.165 | −0.66 | −0.294 | −1.08 | −0.258 | −0.95 |
MONTH | Yes | Yes | Yes | Yes | Yes | |||||
Observations | 4376 | 4376 | 4376 | 4376 | 4376 | |||||
R2 | 0.003 | 0.003 | 0.0002 | 0.0003 | 0.003 | |||||
Adjusted R2 | 0.002 | 0.002 | −0.0003 | −0.0002 | 0.003 | |||||
Res. Std. Error | 0.048 | 0.056 | 0.055 | 0.059 | 0.059 | |||||
(df = 21,961) | ||||||||||
F statistic | 6.167 *** | 5.665 *** | 0.356 | 0.582 | 6.946 *** | |||||
(df = 8; 21,961) |
Dependent Variable | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
(AReti,t + 10) | (AReti,t + 5) | (AReti,t + 3) | (AReti,t + 2) | (AReti,t + 1) | ||||||
Coeff. | t | Coeff. | t | Coeff. | t | Coeff. | t | Coeff. | t | |
Constant | 0.002 | 0.94 | 0.002 | 0.72 | 0.004 | 1.59 | 0.0004 | 0.16 | 0.002 | 0.63 |
UGCTalk_ki,t | −0.810 *** | −3.28 | −0.781 *** | −3.25 | −0.660 *** | −2.78 | −0.336 | −1.42 | −0.323 | −1.28 |
MONTH | Yes | Yes | Yes | Yes | Yes | |||||
Observations | 4881 | 4881 | 4881 | 4881 | 4881 | |||||
R2 | 0.002 | 0.002 | 0.002 | 0.001 | 0.0004 | |||||
Adjusted R2 | 0.002 | 0.002 | 0.001 | 0.0001 | −0.00004 | |||||
Res. Std. Error | 0.071 | 0.069 | 0.068 | 0.068 | 0.072 | |||||
(df = 21,961) | ||||||||||
F statistic | 5.377 *** | 5.899 *** | 3.963 *** | 1.327 | 0.909 | |||||
(df = 8; 21,961) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, N.; Chen, K.; He, H. UGC Knowledge Features and Their Influences on the Stock Market: An Empirical Study Based on Topic Modeling. Information 2022, 13, 454. https://doi.org/10.3390/info13100454
Li N, Chen K, He H. UGC Knowledge Features and Their Influences on the Stock Market: An Empirical Study Based on Topic Modeling. Information. 2022; 13(10):454. https://doi.org/10.3390/info13100454
Chicago/Turabian StyleLi, Ning, Kefu Chen, and Huixin He. 2022. "UGC Knowledge Features and Their Influences on the Stock Market: An Empirical Study Based on Topic Modeling" Information 13, no. 10: 454. https://doi.org/10.3390/info13100454
APA StyleLi, N., Chen, K., & He, H. (2022). UGC Knowledge Features and Their Influences on the Stock Market: An Empirical Study Based on Topic Modeling. Information, 13(10), 454. https://doi.org/10.3390/info13100454