Public Opinion Evolution Based on the Two-Dimensional Theory of Emotion and Top2Vec-RoBERTa

Wang, Shaowen; Liu, Qingyang; Hu, Yanrong; Liu, Hongjiu

doi:10.3390/sym17020190

Open AccessArticle

Public Opinion Evolution Based on the Two-Dimensional Theory of Emotion and Top2Vec-RoBERTa

¹

College of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou 311300, China

²

Institute of Informatics, Georg-August-Universität Göttingen, 37073 Göttingen, Germany

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Symmetry 2025, 17(2), 190; https://doi.org/10.3390/sym17020190

Submission received: 19 December 2024 / Revised: 15 January 2025 / Accepted: 23 January 2025 / Published: 26 January 2025

(This article belongs to the Special Issue Machine Learning and Data Analysis II)

Download

Browse Figures

Versions Notes

Abstract

:

This paper applies the concept of symmetry to the design of a research methodology for public opinion evolution, emphasizing that both the construction and analysis processes of the method embody symmetrical principles. In today’s information age, dominated by social media, online platforms have become crucial venues for information dissemination. While the free flow of information promotes public participation, it also introduces certain challenges. Therefore, analyzing the evolution of public opinion and extracting public sentiment holds significant practical value for managing online public sentiment. This study takes the Zibo barbecue incident as a case study, utilizing the two-dimensional theory of emotion and Top2Vec for thematic analysis of public opinion comments. By combining sentiment dictionary methods with the RoBERTa model, we conduct a sentiment polarity analysis of public opinion comments. The results show that the RoBERTa model achieved an accuracy of 98.46% on the test set. The proposed method effectively uncovers public sentiment biases and the influencing factors on public emotions during the evolution of public opinion events, providing a more comprehensive understanding of the emotional dynamics throughout the development of public sentiment. This deeper insight aids in addressing issues related to public opinion more effectively.

Keywords:

BiliBili; sentiment analysis; public opinion evolution; Top2Vec; RoBERTa; two-dimensional theory of emotion

1. Introduction

In today’s information age, dominated by social media, online platforms have become vital venues for information dissemination. Every day, hundreds of millions of users share various types of information on these platforms, including numerous high-profile events. This information often contains rich personal opinions and emotional biases. The rise of social media has made information dissemination faster and more widespread, where a single post or comment can spark extensive discussion and reaction within a short period. This rapid spread and large-scale interaction have not only transformed the way that information circulates but have also profoundly influenced the formation and evolution of public opinion. Through posting, commenting, and sharing, users express their personal stances, creating a complex information ecosystem. This phenomenon has simultaneously brought about unprecedented social changes and challenges [1]. While the free flow of information promotes public participation, it also brings certain challenges. Information on social media may include false or misleading content, and even malicious or hate speech, all of which pose potential threats to public opinion and social stability [2]. Therefore, in this dynamic and ever-changing environment, understanding and harnessing the power of social media has become a crucial topic in the information age [3].

In the context of public opinion evolution, the concept of symmetry plays a central role in understanding the balance and dynamics of emotional responses. Symmetry, in emotional analysis, refers to the equilibrium between the valence and arousal dimensions. This study applies the concept of symmetry to both the design and analysis of the research methodology for public opinion evolution. Specifically, we examine how emotional responses during public opinion events, such as the Zibo Barbecue Incident, exhibit symmetrical patterns in different stages. By capturing the balance between emotional intensity and polarity, we gain a deeper insight into the emotional dynamics of public sentiment, offering a more comprehensive understanding of how public opinion evolves over time.

The “Zibo Barbecue Incident” not only garnered widespread attention in a short period but also sparked intense public discussion on online platforms. Zibo barbecue first gained attention on Douyin, and during the COVID-19 pandemic, the Zibo local government’s warm hospitality and farewell barbecue banquets for quarantined university students laid the groundwork for its popularity. After the pandemic, university students began organizing group trips via high-speed trains to Zibo to taste its barbecue, sharing their experiences on social media. This gradually brought Zibo barbecue into the spotlight. Its unique features, such as “large meat skewers, flatbread, and scallions” and the ceremonial dining style of “one table, one grill, and one rolled pancake”, provided visitors with a distinctive dining experience. The combined efforts of the Zibo municipal government and local businesses, including market regulation, maintaining merchant reputations, and launching dedicated transportation routes, significantly contributed to the widespread popularity of Zibo barbecue. This paper takes the “Zibo Barbecue Incident” as a case study, using Python web scraping to collect comments from videos related to the event on the BiliBili platform as the dataset. Textual topic mining was performed using Top2Vec, and a custom sentiment dictionary was constructed based on the Dalian University of Technology Sentiment Dictionary for annotating the text. The sentiment classification was then evaluated using the RoBERTa model. According to the lifecycle theory, the public opinion dissemination cycle was divided into the initiation, outbreak, decline, and cessation stages. Based on the two-dimensional theory of emotion, comments were classified into valence and arousal, and two-dimensional topic extraction was performed accordingly. Finally, the evolution of sentiment means, two-dimensional topic analysis, and the evolution of comment popularity were conducted to analyze the evolution of public opinion. This approach provides valuable insights for understanding and addressing related issues and offers a useful reference for public opinion analysis in similar events.

The innovations of this study are as follows:

(1) Traditional sentiment analysis typically focuses only on the categories of emotions, neglecting the intensity of emotions. This study integrates the two-dimensional theory of emotion, analyzing changes in different emotional states from the dimensions of valence and arousal. This approach allows for a more comprehensive capture and understanding of the emotional dynamics in the evolution of public opinion.

(2) By combining a sentiment dictionary with deep learning models, this study addresses the limitation of low efficiency in traditional manual annotation. The sentiment dictionary provides rich prior knowledge to help identify emotional tendencies in the text, while the deep learning model, through automatic learning from large datasets, further enhances classification accuracy and generalization ability. Through model training, the accuracy of sentiment classification results is more comprehensively evaluated and validated.

Section 2 reviews the current state of research methods in online public opinion. Section 3 outlines the research framework and methodology of this study. Section 4 presents the research process and results analysis. Section 5 concludes the study.

2. Related Research

Compared to traditional public opinion, online public opinion relies on major information exchange platforms, offering a broader reach that can connect with groups across different regions, age groups, professions, and interests. Traditional public opinion is limited by geography and media channels, whereas online public opinion transcends these limitations, often exerting a more extensive influence [4]. Analyzing comment data on social media can help us understand public sentiment and the patterns of public opinion evolution [5]. In recent years, sentiment classification and topic mining have been widely applied and extensively researched in this field [6].

2.1. Research on Sentiment Classification Methods

Sentiment classification methods can be categorized into fine-grained and coarse-grained approaches. Fine-grained analysis involves sentiment classification methods based on emotional polarity and intensity using sentiment dictionaries. Sentiment classification methods based on sentiment dictionaries use pre-constructed lexicons to determine the sentiment orientation of the text by matching and analyzing the emotional vocabulary within the text. Constructing a sentiment dictionary involves filtering and categorizing vocabulary to create a lexicon that accurately reflects emotional nuances. Zhang et al. expanded sentiment dictionaries by extracting and constructing related dictionaries, such as network terminology dictionaries and negation dictionaries, to enhance topic monitoring on Weibo [7]. Nie et al. proposed a method that combines semantic mapping functions with dictionary construction to capture the rich emotions hidden in hotel review texts [8]. Liu et al. combined sentiment dictionaries with pre-trained word embeddings and used TF-IDF values for weighting. By calculating the weights of sentiment words and neutral words separately and highlighting the role of sentiment words in sentence vectors, they improved the accuracy of sentiment analysis [9].

Coarse-grained text sentiment analysis methods involve using machine learning or deep learning techniques to classify the overall sentiment of the entire text.

Machine-learning-based sentiment classification methods rely on training large-scale labeled datasets to learn patterns and features of emotional expression. By extracting text features and applying classifiers, these methods identify sentiments within the text. Stefanis et al. explored the emotions related to daily COVID-19 monitoring reports posted on Facebook pages and used machine learning algorithms to predict sentiment classifications [10]. Rahman et al. proposed a multilayer classification model that employs supervised machine learning techniques, achieving better recall rates in sentiment classification tasks [11]. Hokijuliandy et al. used a combination of SVM classification and chi-square feature selection methods for sentiment analysis. Their analysis of user comments revealed the main trends in positive reviews [12].

Deep learning models use word embedding techniques (such as Word2Vec and GloVe) to simplify feature engineering and capture semantic information. They employ Recurrent Neural Network (RNN) and Long Short-Term Memory network (LSTM) to handle sequential data; Convolutional Neural Network (CNN) to capture local features [13] and incorporate attention mechanisms; and Transformer models (such as BERT) to capture global dependencies [14] and enhance the model’s generalization ability through large-scale pre-training and fine-tuning. Sisi et al. used a CNN model, combining encoded emotional sequence features with traditional word embedding features for email sentiment classification [15]. Arbane et al. used Bi-LSTM to reveal various issues related to COVID-19 public opinion, aiming to understand people’s concerns during the pandemic [16]. Pota et al. used the BERT model to evaluate the impact of tweet pre-processing operations on sentiment analysis performance. They considered available data in two languages (English and Italian) to assess language dependency [17]. He et al. proposed a BERT-CNN-BiLSTM-Att hybrid model for text sentiment analysis, addressing issues of ambiguity and feature extraction in the sentiment analysis process [18].

2.2. Research on Topic Mining Methods

Topic mining, as an important technique in natural language processing, aims to discover hidden semantic structures and thematic information from text data. The traditional topic model, Latent Dirichlet Allocation (LDA), proposed by Blei et al., models the distribution of vocabulary over topics using probabilistic distributions. It has been widely applied to thematic analysis and document summarization tasks [19]. Zhao et al. used word frequency statistics and LDA methods to identify key terms related to tourism in Nanjing, thereby promoting tourism development [20]. Uthirapathy et al. used the LDA method to identify topics related to climate change in an existing Twitter dataset of public discussions [21]. Yoo et al. utilized LDA and Word2Vec algorithms to extract papers related to specific keywords from research on COVID-19 and identified detailed topics [22].

In summary, there are some limitations in existing research. Current mainstream topic mining methods mainly rely on LDA models and Word2Vec technology. However, these methods have limited capabilities in understanding complex semantic relationships, particularly when dealing with unstructured social media text, where capturing deep semantic information is challenging. Although deep-learning-based methods provide various metrics to evaluate model performance, they often require cumbersome manual annotation, which is a massive engineering task for large-scale data and carries a significant degree of subjectivity. Additionally, relying solely on models for sentiment analysis lacks theoretical support and has lower credibility.

Therefore, this study develops a public opinion topic analysis framework based on the two-dimensional theory of emotion and lifecycle theory, using the Top2Vec topic mining method. On the other hand, it combines a sentiment dictionary with the RoBERTa model to perform sentiment polarity analysis on public opinion comments. The sentiment dictionary is used to calculate sentiment values and perform initial sentiment classification, while the RoBERTa model is used to evaluate the accuracy of sentiment classification.

3. Materials and Methods

3.1. Research Framework

This study uses the “Zibo Barbecue Incident” as a case study, selecting comment data from videos related to the event on the BiliBili platform to construct a text dataset. It proposes a public opinion evolution research method based on the two-dimensional theory of emotion and the Top2Vec-RoBERTa model. The overall research framework is illustrated in Figure 1 and includes five stages: data collection and pre-processing, division of the public opinion dissemination cycle, sentiment analysis, topic extraction, and public opinion evolution analysis.

Data Collection and Pre-processing: Crawling comments from videos related to the “Zibo Barbecue Incident” on the BiliBili platform, followed by text pre-processing such as removing irrelevant comments, deleting duplicates, and eliminating emojis. The jieba segmentation tool and a custom vocabulary are used to segment Chinese sentences, with a stopword list from Sichuan University applied to obtain the text dataset.
Lifecycle Classification: Dividing the specific stages of public opinion based on changes in the volume of comments over time, according to the lifecycle method.
Sentiment Analysis: Constructing a custom sentiment dictionary based on the Dalian University of Technology Sentiment Dictionary. Sentiment values are calculated for the comment corpus related to key figures in the public opinion, and sentiment polarity is annotated. The RoBERTa model is used to evaluate the sentiment classification performance of the dictionary, resulting in the identification of public emotional attitudes.
Topic Extraction: Categorizing sentiment values into valence and arousal based on the two-dimensional theory of emotion and using the Top2Vec model to identify topics within these dimensions.
Public Opinion Evolution Analysis: Analyzing sentiment mean evolution, two-dimensional topic analysis, and comment popularity evolution to perform a comprehensive analysis of public opinion evolution.

3.2. Research Methods

3.2.1. Data Pre-Processing Methods

Text data pre-processing includes removing duplicate and irrelevant texts, eliminating emojis, tokenization, and removing stopwords. Words are the smallest semantic units in text, and the accuracy of text segmentation directly impacts the results of sentiment classification. By comparing the segmentation results with the stopword list and removing stopwords, the complexity of subsequent calculations is reduced, and the performance of classification predictions is improved. The completeness of the vocabulary affects the results of tokenization.

3.2.2. Two-Dimensional Theory of Emotion

The Two-Dimensional Theory of Emotion, proposed by Russell [23], is an important model in emotional psychology, as shown in Figure 2. This theory suggests that emotional states can be described using two fundamental dimensions: valence and arousal. Valence represents the positive or negative polarity of an emotion, ranging from extreme pleasure (such as joy and happiness) to extreme displeasure (such as sadness and anger). Arousal indicates the level of activation of an emotion, ranging from high activation (such as excitement and anger) to low activation (such as calmness and fatigue).

In sentiment analysis, the Two-Dimensional Theory of Emotion provides an effective method to refine emotional categories. By analyzing the valence and arousal scores of emotional vocabulary in the text, we can more accurately assess the emotional state of the text. For example, although “happiness” and “anger” are both high-arousal emotions, the former has a positive valence, while the latter has a negative valence. In this way, the Two-Dimensional Theory of Emotion not only helps to identify the basic polarity of emotions (positive or negative) but also provides insights into the specific nature and intensity of emotions.

3.2.3. Top2Vec

Top2Vec [24] is an algorithm for topic modeling and semantic search that automatically detects text topics and generates vector representations through joint embedding, dimensionality reduction, and clustering. The algorithm uses techniques such as Doc2Vec to create joint embeddings of documents and words. It then applies UMAP (Uniform Manifold Approximation and Projection) [25] for dimensionality reduction, which helps transform the sparse, high-dimensional document vectors into a more manageable low-dimensional space. This reduction aids in identifying dense regions in the data, where similar documents are clustered together. The UMAP algorithm works by preserving the local structure of the data, ensuring that closely related documents remain close in the lower-dimensional space, facilitating more accurate topic detection and visualization.

Following dimensionality reduction, HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) [26] is applied for clustering, allowing the identification of dense areas of semantically related documents. HDBSCAN operates by clustering document vectors based on their proximity in the reduced space, automatically discovering the optimal number of clusters without the need for predefined parameters. This clustering process enables the model to identify distinct topics within the corpus, with the resulting topic vectors derived from the clusters.

Compared to traditional LDA (Latent Dirichlet Allocation), Top2Vec automatically discovers the number of topics through HDBSCAN and does not require a stopword list. In model training, Top2Vec requires minimal human intervention beyond selecting parameters during the process [27].

3.2.4. Sentiment Polarity Recognition Based on Sentiment Dictionary and RoBERTa

Constructing the Sentiment Dictionary

By using a sentiment dictionary to calculate the sentiment values of comments, the time cost of manual annotation can be reduced. This study uses the Dalian University of Technology Sentiment Dictionary as the foundational dictionary. Based on Ekman’s model [28], this dictionary refines positive emotions by incorporating a sentiment category of “good” and categorizes emotions into 7 major categories and 21 subcategories, with emotion intensity graded on a scale of 1, 3, 5, 7, and 9. There are seven types of parts of speech: noun, verb, adjective, adverb, network words, idioms, and prepositional phrases. The sentiment values are calculated using this dictionary to determine sentiment polarity. The format of the sentiment vocabulary is generally as shown in Table 1, with specific emotion classifications detailed in Table 2. One emotional word may correspond to multiple emotions.

RoBERTa Sentiment Polarity Classification

RoBERTa (Robustly optimized BERT approach) is an improved version of BERT (Bidirectional Encoder Representations from Transformers), as shown in Figure 3. Unlike the original BERT model, RoBERTa leverages several improvements, including dynamic masking, the removal of Next Sentence Prediction (NSP), and the use of larger batch sizes. These enhancements allow RoBERTa to better capture language representations and achieve superior performance on various natural language processing tasks.

One key difference between RoBERTa and BERT is the dynamic masking technique used during pre-training. In BERT, the masking of words is performed in a static manner, where a fixed set of words is randomly selected to be masked for every training instance. RoBERTa, on the other hand, uses dynamic masking, where the masked words are re-randomized each time a sequence is fed into the model, allowing the model to learn more diverse patterns and avoid overfitting.

Additionally, RoBERTa does not use the Next Sentence Prediction (NSP) task, which was originally part of BERT’s pre-training process. NSP aimed to predict whether two sentences appeared consecutively in the original text. RoBERTa’s removal of NSP is based on the observation that this task is not as useful for many downstream tasks, and its removal allows the model to focus more effectively on learning contextual information within individual sentences.

RoBERTa also benefits from the use of larger batch sizes during training. By increasing the batch size, RoBERTa is able to process more examples simultaneously, which helps to stabilize the gradient estimates and improves the model’s generalization ability. This enables the model to train more effectively on large datasets, which contributes to its enhanced performance.

Furthermore, RoBERTa changes the next encoding format used in BERT. Rather than using a single sentence format (which is commonly used for single-sentence tasks), RoBERTa pre-trains with longer sequences of text. This helps the model better capture long-range dependencies and nuances in language, which is critical for tasks that require understanding context over longer spans of text.

In this study, the RoBERTa model is fine-tuned for the task of sentiment polarity recognition. The main feature of RoBERTa is its bidirectional attention mechanism, which allows the model to consider both the preceding and following context of a word, enhancing its understanding of the context.

4. Results

4.1. Data COllection and Pre-Processing

This study employs Python web scraping to collect comment texts related to the “Zibo Barbecue Incident” from the BiliBili platform. The dataset, using keywords “Zibo Explosion” and “Zibo Barbecue”, includes comments from content on BiliBili between “2023-03-22” and “2023-07-01”. The study uses the Jieba segmentation tool, matching the text with words from a custom vocabulary. A stopword list is used to filter out stopwords. After removing duplicate data and meaningless texts, a total of 17,873 valid comments are obtained.

4.2. Lifecycle Classification

The distribution of public opinion data over time is shown in Figure 4. Using the public opinion evolution cycle classification method, the data for the “Zibo Barbecue Incident” is divided into four stages: the Initiation Stage (from “2023-03-22 to 2023-04-07”), the Outbreak Stage (from “2023-04-08 to 2023-04-11”), the Decline Stage (from “2023-04-12 to 2023-05-06”), and the Resolution Stage (from “2023-05-07 to 2023-07-01”).

4.3. Sentiment Analysis

4.3.1. Sentiment Calculation

In this study, sentiment scores and emotions for each text are calculated based on the vocabulary and corresponding emotion labels and scores from the Dalian University of Technology Sentiment Dictionary. First, each text is segmented into individual words or phrases using the jieba tokenizer. The segmented results are then matched with the vocabulary in the sentiment dictionary to extract all the matching sentiment words. The sentiment dictionary contains multiple emotional categories (such as joy, anger, sadness, etc.), with each word assigned a corresponding sentiment intensity score. For each matched sentiment word, the score is weighted according to its corresponding value in the dictionary. Based on the matched sentiment words and their corresponding scores, the total sentiment score for each text is calculated. Additionally, for each text, the different emotional categories and their frequencies are also counted. As shown in Table 3, the score for each sentiment word is weighted, and the final comprehensive sentiment score for the text is derived, along with the identification of the predominant emotional categories and their counts.

4.3.2. Roberta Model Sentiment Classification

This study uses the RoBERTa model to evaluate the accuracy of sentiment labels and further classify the comments annotated with the sentiment dictionary. The experimental environment for the research includes Windows, Jupyter Notebook as the development environment, Python 3.8 as the programming language, and TensorFlow 2.10.0 as the deep learning framework.

(1) Dataset splitting: From the corpus, 20% is randomly selected as the test set. Then, 20% is randomly selected from the remaining 80% of the dataset to form the validation set, with the remaining portion used as the training set. Overall, the dataset is divided into test, validation, and training sets in a ratio of 0.2:0.16:0.64.

(2) Input processing: To prepare the text data for RoBERTa, each input sentence is tokenized using the tokenizer provided with the pre-trained RoBERTa model. The tokenizer splits the text into subwords, words, or phrases and maps them to integer indices corresponding to the RoBERTa vocabulary. Special tokens such as [CLS] (classification token) and [SEP] (separator token) are appended to denote the start and end of a sequence, ensuring that the model processes the input in the expected format. Additionally, an attention mask is generated for each token, indicating which tokens should be attended to during processing and which should be ignored (e.g., padding tokens).

(3) Training procedure: The encoded data are fed into the RoBERTa model, which uses a multilayer bidirectional transformer architecture to capture contextual information. The model is trained to minimize the cross-entropy loss between the predicted and actual sentiment labels. To prevent overfitting, early stopping criteria were applied. After six epochs, the training accuracy continued to improve, but the validation accuracy plateaued, indicating overfitting. As a result, the training process was capped at six epochs.

(4) Performance metrics: Accuracy and loss curves for both the training and validation sets across all epochs are plotted and analyzed to monitor the model’s performance. These curves are depicted in Figure 5.

(5) Model evaluation and comparison: In order to assess the performance of the RoBERTa model and compare it with other models, we evaluated the accuracy of four different approaches: RoBERTa, BERT, LSTM, and BiLSTM. The results of these comparisons are summarized in Table 4.

As shown in Table 4 and Figure 5, after training, the RoBERTa model achieved an accuracy of 98.67% on the validation set and 98.46% on the test set, demonstrated superior performance in terms of precision, recall, and F1-score, indicating good model fitting performance. The method of annotating comment sentiment values based on the dictionary achieved high accuracy, demonstrating significant feasibility and practical value.

4.3.3. Two-Dimensional Emotion Analysis

Based on the lifecycle and emotional valence, “Happy”, “Good”, and “Surprise” are categorized as high-valence emotions, while “Sadness”, “Anger”, “Disgust”, and “Fear” are categorized as low-valence emotions. The results are shown in Table 5 and Figure 6.

From the valence dimension, it can be observed that in the Initiation Stage, the proportion of high-valence and low-valence emotions in public comments is relatively low. This suggests limited emotional feedback, likely due to the event being in its early stages and attracting less public attention. During the Outbreak Stage, there is a significant increase in both high-valence and low-valence emotions, with proportions being nearly equal. This reflects a vigorous reaction and diverse public sentiment as the controversy intensifies. In the Decline Stage, both high-valence and low-valence emotions show a similar but lower proportion, indicating weakened emotional responses as the event wanes. In the Resolution Stage, high-valence emotions slightly surpass low-valence emotions. Although overall emotional feedback remains balanced, positive comments slightly dominate, which may indicate that the public feels relatively satisfied with the resolution of the issue.

Based on emotional arousal, “Good”, “Sadness”, and “Disgust” are categorized as low-arousal emotions, while “Happy”, “Anger”, “Surprise”, and “Fear” are categorized as high-arousal emotions. The results are shown in Table 6 and Figure 7.

During the Initiation Stage, the proportion of high-arousal comments is 3.39%, while low-arousal comments account for 5.39%, indicating a relatively calm public emotional response at this time. During the Outbreak Stage, the proportion of high-arousal comments rises significantly to 36.29%, while low-arousal comments are 37.86%, showing that the event triggered intense public attention and emotional reactions. In the Decline Stage, the proportion of high-arousal comments further increases to 52.89%, while low-arousal comments account for 50.64%, indicating that despite the event gradually fading, public emotions remain highly agitated. Finally, in the Resolution Stage, the proportion of high-arousal comments drops to 7.43%, with low-arousal comments at 6.11%, reflecting a significant decrease in emotional arousal after the event’s resolution, with comments becoming calmer.

4.3.4. Evolution of Sentiment Mean

Figure 8 shows the evolution of the average sentiment values of public comments on the same date throughout the public opinion period. Significant fluctuations in sentiment values are observed between different dates. During the Resolution Stage, the number of comments decreases sharply, with some dates having only a few comments. In such cases, the sentiment values calculated from a small number of comments may cause extreme fluctuations in the results. Therefore, sentiment values from the Resolution Stage are excluded from the analysis, focusing only on data from stages with a higher volume of comments.

By analyzing the sentiment means during the high-comment volume stages, we can more clearly capture the evolution of public sentiment throughout the public opinion event. Significant fluctuations in sentiment are closely related to specific points in time. In the Initiation Stage of the public opinion, due to the limited understanding of the event, comments exhibit considerable diversity, leading to noticeable positive and negative fluctuations in sentiment means. These polarized comments reflect public emotional uncertainty and incomplete information at the early stage of the event. As time progresses and more information is disclosed, public understanding of the event deepens. During the Outbreak Stage of the public opinion, the overall sentiment mean reaches its peak. This corresponds to the public’s positive feedback on Zibo barbecue after the pandemic ended and they experienced it firsthand. The gradual stabilization of sentiment reflects the diminishing impact of the event.

4.4. Topic Analysis

Based on the Two-Dimensional Theory of Emotion, this study categorizes texts according to valence and arousal dimensions and uses Top2Vec for topic extraction. The distribution of topics and keywords under these two dimensions is shown in Table 7.

From the perspective of valence, Topic 1 illustrates a pleasant dining experience with an overall positive emotional inclination. Topic 2 includes both positive emotions such as “honest” and “reassured” as well as negative emotions like “short weight”, resulting in a mixed emotional tendency. Topic 3 involves positive experiences related to the city and marketing, with an overall positive emotional inclination. Topic 4 encompasses both positive aspects such as “harmonious governance” and negative aspects like “deceitful”, resulting in a more complex emotional tone. This indicates that the quality of dining experiences and market management significantly affects the public’s emotional experience from positive to negative.

From the perspective of arousal, Topic 1 mainly involves novel and special experiences with moderate emotional arousal, displaying a certain level of excitement. Topic 2 includes elements of surprise and astonishment, with higher arousal that may provoke stronger emotional reactions. Topic 3 has lower emotional arousal, showing a more calm emotion. Topic 4 primarily describes local characteristics and stable experiences, also with low arousal, conveying a sense of calm and satisfaction. This suggests that integrity and pricing significantly influence public emotional responses, highlighting the importance of better serving public needs.

From this, we can conclude that public emotional experiences are diverse, influenced by factors such as dining out and market management. Novel and special experiences can enhance positive emotions, while integrity and pricing have significant impacts on emotional responses, emphasizing the importance of maintaining freshness and integrity. Despite an overall positive emotional tendency, issues in market management still provoke negative emotions, indicating a need for further supervision and management. Additionally, stable and reliable experiences convey calm and satisfaction, showing that stability and reliability are key factors in improving public satisfaction. Therefore, focusing on diverse factors, particularly novelty, integrity, fairness, and stability, is crucial for enhancing public emotional experiences and overall well-being.

4.5. Public Opinion Evolution Analysis

The details of the “Zibo Barbecue Incident” are shown in Table 8. Due to the pandemic, students from Shandong University were quarantined at home in Zibo. During this period, the local government warmly hosted them for free and arranged a barbecue for them before their departure. This event added warmth and human touch to the image of Zibo city and marked the initiation of the incident, with relatively low public attention at this time.

In the Initiation Stage, public sentiment began to fluctuate gradually. This corresponds to the topic of “students group visiting Zibo for barbecue” gaining traction on social media starting 5 April. The extensive discussions and shares about Zibo barbecue on social platforms rapidly increased the event’s popularity, causing a surge in public attention. However, the public’s understanding was insufficient, and sentiment was mixed. In the Outbreak Stage, public sentiment was highly positive, corresponding to 8–10 April 2023. The incident gained traction due to the confirmed integrity of local businesses and the release of several favorable policies by local authorities, making it a hot topic on social media. Although the event remained a topic of discussion, attention began to decline gradually, and sentiment levels stabilized, indicating that public emotional responses had become more stable.

Keywords in the public opinion themes such as “political stability”, “honest”, “reassuring”, and “dishonest” indicate the public’s focus on policies and businesses. In fact, the confirmation of local favorable policies and conscientious businesses resulted in positive public sentiment regarding the “Zibo Barbecue Incident”. Overall, conscientious businesses and positive government policies contribute to favorable evaluations and development in the local area.

5. Conclusions

This study proposed a method for analyzing the evolution of public sentiment based on the Two-Dimensional Theory of Emotion and the Top2Vec-RoBERTa model, incorporating a sentiment analysis approach that combines sentiment dictionaries with deep learning techniques. By integrating these two methods, sentiment analysis results regarding the central figures in public opinion were obtained. Using the “Zibo Barbecue Incident” as a case study, 17,873 comments from BiliBili videos related to the event were collected as samples. The sentiment of these comments was annotated and analyzed using the Dalian University of Technology sentiment dictionary and the RoBERTa model, which reduced the workload of manual annotation. Top2Vec, combined with the Two-Dimensional Theory of Emotion, was used to analyze changes in emotional states from both the valence and arousal dimensions, providing a more comprehensive understanding of the emotional dynamics throughout the development of public opinion. Under the RoBERTa model, the accuracy of the sentiment classification was evaluated using accuracy metrics, achieving an accuracy rate of 98.46% on the test set. The analysis of sentiment mean evolution, two-dimensional topic analysis, and comment popularity evolution provided deeper insights and solutions for related issues. The limitation of this study is that it did not consider the understanding of emojis during sentiment value calculation using the sentiment dictionary. Future research will further consider more granular sentiment classification to improve sentiment analysis accuracy.

Author Contributions

Writing—original draft preparation: H.L.; supervision: H.L.; funding acquisition: H.L.; investigation: Y.H.; formal analysis: Y.H.; resources: S.W.; methodology: S.W.; validation: S.W.; data curation: Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Humanity and Social Science Foundation of Ministry of Education of China (nos. 18YJA630037, 21YJA630054).

Institutional Review Board Statement

This article does not contain any studies with human participants or animals performed by any of the authors.

Data Availability Statement

All computer code used in this study is available at the GitHub repository (https://github.com/w164186/Mycode.git (accessed on: 2 January 2025)).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of our paper “Public Opinion Evolution Based on the Two-Dimensional Theory of Emotion and Top2Vec-RoBERTa”.

References

Wang, J.; Zhang, X.; Liu, W.; Li, P. Spatiotemporal pattern evolution and influencing factors of online public opinion—Evidence from the early-stage of COVID-19 in China. Heliyon 2023, 9, e20080. [Google Scholar] [CrossRef] [PubMed]
Ren, S.; Gong, C.; Zhang, C.; Li, C. Public opinion communication mechanism of public health emergencies in Weibo: Take the COVID-19 epidemic as an example. Front. Public Health 2023, 11, 1276083. [Google Scholar] [CrossRef] [PubMed]
Xie, Q.; Han, Q.; Chen, D. Analysis of Sports Popular Trend Based on Public Opinion Mining of New Media. Math. Probl. Eng. 2022, 2022, 9144231. [Google Scholar] [CrossRef]
Zhang, C.; Ma, N.; Sun, G. Using Grounded Theory to Identify Online Public Opinion in China to Improve Risk Management—The Case of COVID-19. Int. J. Environ. Res. Public Health 2022, 19, 14754. [Google Scholar] [CrossRef] [PubMed]
Xu, B.; Liu, Y. The role of big data in network public opinion within the colleges and universities. Soft Comput. 2022, 26, 10853–10862. [Google Scholar] [CrossRef]
Smitha, E.; Sendhilkumar, S.; Mahalakshmi, G. Intelligence system for sentiment classification with deep topic embedding using N-gram based topic modeling. J. Intell. Fuzzy Syst. 2023, 45, 1539–1565. [Google Scholar] [CrossRef]
Zhang, S.; Wei, Z.; Wang, Y.; Liao, T. Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary. Future Gener. Comput. Syst. 2018, 81, 395–403. [Google Scholar] [CrossRef]
Nie, R.x.; Tian, Z.p.; Wang, J.q.; Chin, K.S. Hotel selection driven by online textual reviews: Applying a semantic partitioned sentiment dictionary and evidence theory. Int. J. Hosp. Manag. 2020, 88, 102495. [Google Scholar] [CrossRef]
Liu, H.; Chen, X.; Liu, X. A study of the application of weight distributing method combining sentiment dictionary and TF-IDF for text sentiment analysis. IEEE Access 2022, 10, 32280–32289. [Google Scholar] [CrossRef]
Stefanis, C.; Giorgi, E.; Kalentzis, K.; Tselemponis, A.; Nena, E.; Tsigalou, C.; Kontogiorgis, C.; Kourkoutas, Y.; Chatzak, E.; Dokas, I.; et al. Sentiment analysis of epidemiological surveillance reports on COVID-19 in Greece using machine learning models. Front. Public Health 2023, 11, 1191730. [Google Scholar] [CrossRef]
Rahman, H.; Tariq, J.; Masood, M.A.; Subahi, A.F.; Khalaf, O.I.; Alotaibi, Y. Multi-tier sentiment analysis of social media text using supervised machine learning. Comput. Mater. Contin 2023, 74, 5527–5543. [Google Scholar] [CrossRef]
Hokijuliandy, E.; Napitupulu, H.; Firdaniza. Application of SVM and Chi-Square Feature Selection for Sentiment Analysis of Indonesia’s National Health Insurance Mobile Application. Mathematics 2023, 11, 3765. [Google Scholar] [CrossRef]
Alrashidi, M.; Selamat, A.; Ibrahim, R.; Fujita, H. Social Recommender System Based on CNN Incorporating Tagging and Contextual Features. J. Cases Inf. Technol. (JCIT) 2024, 26, 1–20. [Google Scholar] [CrossRef]
Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Liu, S.; Lee, I. Sequence encoding incorporated CNN model for Email document sentiment classification. Appl. Soft Comput. 2021, 102, 107104. [Google Scholar] [CrossRef]
Arbane, M.; Benlamri, R.; Brik, Y.; Alahmar, A.D. Social media-based COVID-19 sentiment classification model using Bi-LSTM. Expert Syst. Appl. 2023, 212, 118710. [Google Scholar] [CrossRef] [PubMed]
Pota, M.; Ventura, M.; Fujita, H.; Esposito, M. Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets. Expert Syst. Appl. 2021, 181, 115119. [Google Scholar] [CrossRef]
He, A.; Abisado, M. Text Sentiment Analysis of Douban Film Short Comments Based on BERT-CNN-BiLSTM-Att Model. IEEE Access 2024, 12, 45229–45237. [Google Scholar] [CrossRef]
Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
Zhao, N.; Fan, G.; Qi, Z.; Shi, J. Exploring the current situation of cultural tourism scenic spots based on LDA model—Take Nanjing, Jiangsu Province, China as an example. Procedia Comput. Sci. 2023, 221, 826–832. [Google Scholar] [CrossRef]
Uthirapathy, S.E.; Sandanam, D. Topic Modelling and Opinion Analysis On Climate Change Twitter Data Using LDA And BERT Model. Procedia Comput. Sci. 2023, 218, 908–917. [Google Scholar] [CrossRef]
Yoo, S.y.; Lim, G.g. A study on the classification of research topics based on COVID-19 academic research using Topic modeling. J. Intell. Inf. Syst. 2022, 28, 155–174. [Google Scholar]
Russell, J.A. A circumplex model of affect. J. Personal. Soc. Psychol. 1980, 39, 1161. [Google Scholar] [CrossRef]
Angelov, D. Top2vec: Distributed representations of topics. arXiv 2020, arXiv:2008.09470. [Google Scholar]
McInnes, L.; Healy, J.; Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
McInnes, L.; Healy, J.; Astels, S. hdbscan: Hierarchical density based clustering. J. Open Source Softw. 2017, 2, 205. [Google Scholar] [CrossRef]
Ghasiya, P.; Okamura, K. Investigating COVID-19 news across four nations: A topic modeling and sentiment analysis approach. IEEE Access 2021, 9, 36645–36656. [Google Scholar] [CrossRef]
Ekman, P. An argument for basic emotions. Cogn. Emot. 1992, 6, 169–200. [Google Scholar] [CrossRef]

Figure 1. Research framework.

Figure 2. Conceptual model of the Two-Dimensional Theory of Emotion.

Figure 3. RoBERTa model.

Figure 4. Distribution of public opinion data.

Figure 5. Changes in loss and accuracy over model training epochs.

Figure 6. Valence public emotion analysis.

Figure 7. Arousal public emotion analysis.

Figure 8. Evolution of sentiment mean.

Table 1. Sample sentiment vocabulary ontology.

Word	Part of Speech	Number of Meanings	Meaning Sequence	Emotion Category	Intensity	Polarity
Fearless	idiom	1	1	PH	7	1
Cash-strapped	idiom	1	1	NE	7	0
Tight Thoughtful	adj	1	1	PH	5	1
Exaggeration	idiom	1	1	NN	5	2

Table 2. Classification of sentiment words.

Emotion Category	Emotion Type	Example Words
Happy	Joy (PA)	Joyful, happy, smiling, overjoyed
	Calm (PE)	Secure, relieved, at ease, calm and untroubled
Good	Respect (PD)	Admire, respect, salute, revere
	Praise (PH)	Heroic, excellent, distinguished, praiseworthy
	Trust (PG)	Trust, rely on, believe, be confident
	Love (PB)	Fondness, beloved, love, cherish
	Wish (PK)	Wish, desire, hope for, long for
Anger	Anger (NA)	Angry, outraged, furious, enraged
Sadness	Sad (NB)	Sad, sorrowful, heartbroken, grief-stricken
	Despair (NJ)	Desperate, hopeless, desolate, devastated
	Regret (NH)	Regret, remorse, guilt, sorrow
	Pity (PF)	Pity, sympathy, compassion, sorrow
Fear	Anxiety (NI)	Anxious, uneasy, apprehensive, jittery
	Fear (NC)	Fearful, scared, afraid, terrified
	Shame (NG)	Shameful, disgraced, humiliated, mortified
Disgust	Disgust (NE)	Disgusted, repulsed, sickened, revolted
	Hate (ND)	Hatred, loathing, abhorrence, aversion
	Contempt (NN)	Contemptuous, scornful, disdainful, sneering
	Jealousy (NK)	Jealous, envious, resentful, covetous
	Regret (NL)	Regretful, remorseful, sorry, rueful
Surprise	Regret (NL)	Astonished, amazed, surprised, shocked

Table 3. Sentiment dictionary calculation results.

Time	Content	Disgust	Sadness	Good	Happy	Label
2023-04-08	That’s a real good businessman.	0	0	1	0	1
2023-04-08	Why bother others when you’re so old? If you want to eat, rob the tourists yourself.	1	0	0	0	−1
2023-04-08	I brought a scale in Shandong.I’m sorry.I brought a scale in Chengdu.I’m sorry.	0	2	0	0	0
2023-04-08	The Art of Speaking	0	0	0	0	0
2023-04-08	I’m a bit touched and proud of Zibo, but I’m touched and proud of the most normal and desirable thing I’ve ever seen.	0	0	0	2	1

Table 4. Comparison of model performance in sentiment analysis.

	Precision	Recall	F1 Score
LSTM	88.0587%	85.1748%	84.2704%
BERT	97.3348%	97.3217%	97.3251%
Bi-LSTM	88.0335%	85.1748%	84.2682%
RoBERTa	98.2630%	98.2657%	98.2629%

Table 5. Valence analysis of public sentiment.

	Initiation Stage	Outbreak Stage	Decline Stage	Resolution Stage
High Valence (num)	666	4632	6364	783
Percentage (%)	5.35%	37.22%	51.14%	6.29%
Low Valence (num)	218	1878	2451	305
Percentage (%)	4.49%	38.71%	50.52%	6.28%

Table 6. Arousal analysis of public sentiment.

	Initiation Stage	Outbreak Stage	Decline Stage	Resolution Stage
High Arousal (num)	83	889	1296	182
Percentage (%)	3.39%	36.29%	52.89%	7.43%
Low Arousal (num)	801	5621	7519	906
Percentage (%)	5.39%	37.86%	50.64%	6.11%

Table 7. Distribution of topics and keywords in two dimensions.

	Valence	Arousal
Topic 1	Taste, Northeast, Grocery Shopping,	Classmates, Special, First Time, Express, Feelings
	Going Out, Small Cakes, Atmosphere
Topic 2	Business, Honest, Short Weight, Dining, Assured	Grocery Shopping, Unexpectedly,
		Integrity, Prices, exceeded expectations
Topic 3	Observing, Marketing, Qingdao,	Taxi, Two, Jinan, Hotel, Remember
	Special, Affirmative, Remember
Topic 4	Feelings, Harmonious Governance,	Qingdao, Weifang, Yantai, Experience,
	Deceitful, Customers, Market Supervision	Harmonious Governance, Simple and Honest

Table 8. The timeline of the Zibo Barbecue Incident.

Date	Incidents
2022-05	During the home quarantine incident in Zibo involving Shandong University students, the local government warmly hosted them for free. Before leaving, they were treated to a barbecue, and they agreed to meet again in Zibo for barbecue when the spring arrived.
2023-04-05	With the public largely in a state of recovery from COVID-19, the correct way to enjoy Zibo barbecue has started to spread. The topic of “college students organizing trips to Zibo for barbecue” has gradually gained popularity on online platforms.
2023-04-08	A popular Douyin influencer tested the fairness of scales in Zibo and found that no store was shortchanging customers. The genuine and honest quality of Zibo locals once again brought Zibo barbecue into the spotlight. A wealth of user-generated content continues to be produced on various online platforms, generating high levels of engagement.
2023-04-10	The city of Zibo held a special press conference for barbecue, launched 24 dedicated high-speed train services for weekend round trips, and introduced 21 new customized barbecue bus routes as part of a series of comprehensive services. Consequently, “Zibo Barbecue” continuously trended on major platforms, and both “Zibo Barbecue” and the city of “Zibo” became the latest internet sensations.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, S.; Liu, Q.; Hu, Y.; Liu, H. Public Opinion Evolution Based on the Two-Dimensional Theory of Emotion and Top2Vec-RoBERTa. Symmetry 2025, 17, 190. https://doi.org/10.3390/sym17020190

AMA Style

Wang S, Liu Q, Hu Y, Liu H. Public Opinion Evolution Based on the Two-Dimensional Theory of Emotion and Top2Vec-RoBERTa. Symmetry. 2025; 17(2):190. https://doi.org/10.3390/sym17020190

Chicago/Turabian Style

Wang, Shaowen, Qingyang Liu, Yanrong Hu, and Hongjiu Liu. 2025. "Public Opinion Evolution Based on the Two-Dimensional Theory of Emotion and Top2Vec-RoBERTa" Symmetry 17, no. 2: 190. https://doi.org/10.3390/sym17020190

APA Style

Wang, S., Liu, Q., Hu, Y., & Liu, H. (2025). Public Opinion Evolution Based on the Two-Dimensional Theory of Emotion and Top2Vec-RoBERTa. Symmetry, 17(2), 190. https://doi.org/10.3390/sym17020190

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Public Opinion Evolution Based on the Two-Dimensional Theory of Emotion and Top2Vec-RoBERTa

Abstract

1. Introduction

2. Related Research

2.1. Research on Sentiment Classification Methods

2.2. Research on Topic Mining Methods

3. Materials and Methods

3.1. Research Framework

3.2. Research Methods

3.2.1. Data Pre-Processing Methods

3.2.2. Two-Dimensional Theory of Emotion

3.2.3. Top2Vec

3.2.4. Sentiment Polarity Recognition Based on Sentiment Dictionary and RoBERTa

Constructing the Sentiment Dictionary

RoBERTa Sentiment Polarity Classification

4. Results

4.1. Data COllection and Pre-Processing

4.2. Lifecycle Classification

4.3. Sentiment Analysis

4.3.1. Sentiment Calculation

4.3.2. Roberta Model Sentiment Classification

4.3.3. Two-Dimensional Emotion Analysis

4.3.4. Evolution of Sentiment Mean

4.4. Topic Analysis

4.5. Public Opinion Evolution Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI