1. Introduction
Public opinion was an early concern in political science. Lowell believed that public opinion refers to people’s opinions on real events plus their subjective ideas. While the rapid development of the Internet has significantly changed people’s lives, it has also brought new opportunities for the study of various public opinions [
1]. Wang argues that online public opinion is the viewpoints or topics generated by public opinion events relying on self-media communication carriers, and that people generate many different viewpoints or topics around a specific public opinion event [
2].
Prior research investigated the mining of public opinion events across various domains, focusing on both trending and technological perspectives.
Chen pointed out that online public opinion reflects people’s social and political attitudes, and studying the trend prediction and evaluation of online public opinion is important for managers’ decision-making [
3]. Hassani et al. employed social trend mining techniques to investigate social dynamics and emerging patterns, extracting event trends through the analysis of time series data gathered from social media platforms and search engines [
4].
Several studies analyzed events from the perspective of machine learning to enhance the capability of event mining. A study proposed an improved LDA module with sentiment discrimination learning capability and analyzed the sentiment intensity of the thematic arguments of different events in time series to effectively analyze online campus public opinion [
5]. A study examining a case of rapid public health policy adaptation in China during the COVID-19 epidemic was carried out by employing K-means, TF–IDF, and HMM methods [
6]. K-means clustering and the Baidu Application Programming Interface Gateway were used to explore why a routine government notice caused a series of unexpected public opinion crises, and results show that how the government releases information and issues clarifications significantly affects public risk perception and emotion [
7]. Weng et al. used the event mining algorithm based on wavelet signal clustering (EDCoW) to process a large amount of event information from the Twitter social media platform, using the word frequency to construct word signals and filter trivial words by viewing the signals to improve the efficiency of event mining [
8].
Several studies approach events from the semantic analysis standpoint, with the goal of exploring relationships between events or conducting sentiment analysis. A public opinion monitoring mechanism consisting of a semantic descriptor that relies on natural language processing algorithms was applied to the 2016/2020 US Presidential Elections tweet datasets to explore succinct public opinion descriptions [
9]. Habibabadi et al. used natural language processing techniques to extract mentions of adverse events of vaccines from Twitter to gain early insights into vaccine safety issues [
10]. Nallapati et al. captured the rich structure of events and their correlations in news topics through event modeling to address the problem of content loss associated with organizing news stories into a flat hierarchical structure by topic [
11].
The above literature has realized the analysis of public opinion in various fields through various natural language processing techniques and event extraction methods, but few studies have conducted research on public opinion in the context of scientific research integrity.
The issue of research integrity has persisted since the inception of scientific research. Historically, due to the constraints of traditional mass media, this problem has predominantly been addressed within the scientific community itself for the purposes of self-evaluation and self-scrutiny. With the continued advancement of the Internet, the dissemination of information has reached unprecedented speeds, and the achievements of science and technology have incrementally captured people’s widespread attention. The supervisors of research integrity are gradually expanding from within the scientific community to the whole of society. In addition to the traditional research integrity accusation, viewers on the Internet also may question the process and results of research by posting short texts on social media platforms. The events described in these words frequently elicit widespread discussion and possess the potential to shape public opinion. If not promptly addressed, they could potentially exert a significant influence on the oversight of scientific research integrity and undermine the credibility of research institutions and scholars. Therefore, this paper proposes a framework model based on TextCNN and a Mixed Event Extractor that is designed to mine potential public opinion event elements pertaining to suspected research integrity issues. The aim of this paper is to furnish research managers with public opinion mining tools, thereby expanding the scope of their inspection and management efforts and enhancing the development of the inspection system within the comprehensive accountability framework for research integrity.
The focus of this paper’s research is on short texts related to potential public opinion events surrounding scientific integrity issues on online social platforms. Our model employs TextCNN [
12] to distinguish potential public opinion events related to scientific integrity issues from common text and subsequently identify key elements through a mixed event extractor. TextCNN exhibits outstanding performance in extracting shallow features from text, rendering it an apt choice for application in short text classification tasks. Incidents of scientific misconduct in research activities are rare events, so predictably, the online short texts of potential public opinion events on scientific integrity issues represent only a small proportion compared to ordinary textual information. To avoid the impact of imbalanced datasets on TextCNN, SMOTE is utilized to process the training set. The Mixed Event Extractor, based on TF-IDF and TextRank, can more comprehensively mine important information related to potential public opinion events surrounding scientific integrity issues.