Discovering Sentimental Interaction via Graph Convolutional Network for Visual Sentiment Prediction
Abstract
:1. Introduction
- We propose an end-to-end image sentiment analysis framework that employs GCN to extract sentimental interaction characteristics among objects. The proposed model makes extensive use of the interaction between objects in the emotional space rather than directly integrating the visual features.
- We design a method to construct graphs over images by utilizing Detectron2 and SentiWordNet. Based on the public datasets analysis, we leverage brightness and texture as the features of nodes and the distances in emotional space as edges, which can effectively describe the appearance characteristics of objects.
- We evaluate our method on five affective datasets, and our method outperforms previous high-performing approaches.
2. Related Work
2.1. Visual Sentiment Prediction
2.2. Graph Convolutional Network(GCN)
3. Method
3.1. Framework
3.2. Graph Construction
3.2.1. Objects Recognition
3.2.2. Graph Representation
3.2.3. Feature Representation
3.3. Interaction Graph Inference
3.4. Visual Feature Representation
3.5. Gcn Based Classifier Learning
4. Experiment Results
4.1. Datasets
4.2. Implementation Details
4.3. Evaluation Settings
- The global color histograms (GCH) consists of 64-bin RGB histogram, and the local color histogram features (LCH) divide the image into 16 blocks and generate a 64-bin RGB histogram for each block [31].
- Borth et al. [28] propose SentiBank to describe the sentiment concept by 1200 adjectives noun pairs (ANPs), witch performs better for images with rich semantics.
- DeepSentibank [32] utilizes CNNs to discover ANPs and realizes visual sentiment concept classification. We apply the pre-trained DeepSentiBank to extract the 2089-dimension features from the last fully connected layer and employ LIBSVM for classification.
- You et al. [29] propose to select a potentially cleaner training dataset and design the PCNN, which is a progressive model based on CNNs.
- Yang et al. [9] employ object detection technique to produce the “Affective Regions” and propose three fusion strategy to generate the final predictions.
- Wu et al. [8] utilize saliency detection to enhance the local features, improving the classification performance to a large margin. And they adopt an ensemble strategy, which may contribute to performance improvement.
4.4. Classification Performance
4.5. The Role of Gcn Branch
4.6. Effect of Panoptic Segmentation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhao, S.; Gao, Y.; Ding, G.; Chua, T. Real-time Multimedia Social Event Detection in Microblog. IEEE Trans. Cybern. 2017, 48, 3218–3231. [Google Scholar] [CrossRef]
- Peng, K.C.; Chen, T.; Sadovnik, A.; Gallagher, A.C. A Mixed Bag of Emotions: Model, Predict, and Transfer Emotion Distributions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 860–868. [Google Scholar]
- You, Q.; Luo, J.; Jin, H.; Yang, J. Building A Large Scale Dataset for Image Emotion Recognition: The Fine Print and the Benchmark. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 308–314. [Google Scholar]
- Zhu, X.; Li, L.; Zhang, W.; Rao, T.; Xu, M.; Huang, Q.; Xu, D. Dependency Exploitation: A Unified CNN-RNN Approach for Visual Emotion Recognition. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 3595–3601. [Google Scholar]
- Compton, R.J. The Interface Between Emotion and Attention: A Review of Evidence from Psychology and Neuroscience. Behav. Cogn. Neurosci. Rev. 2003, 2, 115–129. [Google Scholar] [CrossRef]
- Zheng, H.; Chen, T.; You, Q.; Luo, J. When Saliency Meets Sentiment: Understanding How Image Content Invokes Emotion and Sentiment. In Proceedings of the 2017 IEEE International Conference on Image Processing, Beijing, China, 17–20 September 2017; pp. 630–634. [Google Scholar]
- Fan, S.; Shen, Z.; Jiang, M.; Koenig, B.L.; Xu, J.; Kankanhalli, M.S.; Zhao, Q. Emotional Attention: A Study of Image Sentiment and Visual Attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7521–7531. [Google Scholar]
- Wu, L.; Qi, M.; Jian, M.; Zhang, H. Visual Sentiment Analysis by Combining Global and Local Information. Neural Process. Lett. 2019, 51, 1–13. [Google Scholar] [CrossRef]
- Yang, J.; She, D.; Sun, M.; Cheng, M.M.; Rosin, P.L.; Wang, L. Visual Sentiment Prediction Based on Automatic Discovery of Affective Regions. IEEE Trans. Multimed. 2018, 20, 2513–2525. [Google Scholar] [CrossRef] [Green Version]
- Esuli, A.; Sebastiani, F. Sentiwordnet: A Publicly Available Lexical Resource for Opinion Mining. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, Genoa, Italy, 22–28 May 2006; pp. 417–422. [Google Scholar]
- Nicolaou, M.A.; Gunes, H.; Pantic, M. A Multi-layer Hybrid Framework for Dimensional Emotion Classification. In Proceedings of the 19th International Conference on Multimedia, Scottsdale, AZ, USA, 28 November–1 December 2011; pp. 933–936. [Google Scholar]
- Xu, M.; Jin, J.S.; Luo, S.; Duan, L. Hierarchical Movie Affective Content Analysis Based on Arousal and Valence Features. In Proceedings of the 16th International Conference on Multimedia, Vancouver, BC, Canada, 26–31 October 2008; pp. 677–680. [Google Scholar]
- Zhao, S.; Gao, Y.; Jiang, X.; Yao, H.; Chua, T.S.; Sun, X. Exploring Principles-of-art Features for Image Emotion Recognition. In Proceedings of the 22nd International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 47–56. [Google Scholar]
- Machajdik, J.; Hanbury, A. Affective Image Classification Using Features Inspired by Psychology and Art Theory. In Proceedings of the 18th ACM international conference on Multimedia, Firenze, Italy, 25–29 October 2010; pp. 83–92. [Google Scholar]
- Zhao, S.; Yao, H.; Yang, Y.; Zhang, Y. Affective Image Retrieval via Multi-graph Learning. In Proceedings of the 22nd International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 1025–1028. [Google Scholar]
- Hanjalic, A. Extracting Moods from Pictures and Sounds: Towards Truly Personalized TV. IEEE Signal Process. Mag. 2006, 23, 90–100. [Google Scholar] [CrossRef]
- Zhao, S.; Yao, H.; Gao, Y.; Ding, G.; Chua, T.S. Predicting Personalized Image Emotion Perceptions in Social Networks. IEEE Trans. Affect. Comput. 2018, 9, 526–540. [Google Scholar] [CrossRef]
- Yang, J.; She, D.; Lai, Y.K.; Yang, M.H. Retrieving and Classifying Affective Images via Deep Metric Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Sun, M.; Yang, J.; Wang, K.; Shen, H. Discovering Affective Regions in Deep Convolutional Neural Networks for Visual Sentiment Prediction. In Proceedings of the 2016 IEEE International Conference on Multimedia and Expo, Barcelona, Spain, 12–15 April 2016; pp. 1–6. [Google Scholar]
- You, Q.; Jin, H.; Luo, J. Visual Sentiment Analysis by Attending on Local Image Regions. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 231–237. [Google Scholar]
- Gori, M.; Monfardini, G.; Scarselli, F. A New Model for Learning in Graph Domains. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada, 31 July–4 August 2005; pp. 729–734. [Google Scholar]
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. The Graph Neural Network Model. In Proceedings of the 2nd International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014; pp. 61–80. [Google Scholar]
- Chen, Z.M.; Wei, X.S.; Wang, P.; Guo, Y. Multi-label Image Recognition with Graph Convolutional Networks. In Proceedings of the 2016 IEEE International Conference on Multimedia and Expo, Long Beach, CA, USA, 16–20 June 2019; pp. 5177–5186. [Google Scholar]
- Peng, K.C.; Sadovnik, A.; Gallagher, A.; Chen, T. Where Do Emotions Come From? In Predicting the Emotion Stimuli Map. In Proceedings of the 2016 IEEE International Conference on Image Processing, Phoenix, AZ, USA, 25–28 September 2016; pp. 614–618. [Google Scholar]
- Guo, D.; Wang, H.; Zhang, H.; Zha, Z.J.; Wang, M. Iterative Context-Aware Graph Inference for Visual Dialog. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10055–10064. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Borth, D.; Ji, R.; Chen, T.; Breuel, T.; Chang, S.F. Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs. In Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, Spain, 21–25 October 2013; pp. 223–232. [Google Scholar]
- You, Q.; Luo, J.; Jin, H.; Yang, J. Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 26 January 2015; pp. 381–388. [Google Scholar]
- Mikels, J.A.; Fredrickson, B.L.; Larkin, G.R.; Lindberg, C.M.; d Maglio, S.J.; Reuter-Lorenz, P.A. Emotional Category Data on Images from the International Affective Picture System. Behav. Res. Methods 2005, 37, 626–630. [Google Scholar] [CrossRef] [PubMed]
- Siersdorfer, S.; Minack, E.; Deng, F.; Hare, J. Analyzing and Predicting Sentiment of Images on the Social Web. In Proceedings of the 18th ACM International Conference on Multimedia, Firenze, Italy, 25–29 October 2010; pp. 715–718. [Google Scholar]
- Chen, T.; Borth, D.; Darrell, T.; Chang, S.F. Deep SentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks. arXiv 2014, arXiv:1410.8586. [Google Scholar]
Dataset | Images Number | Source | #Annotators | Emotion Model |
---|---|---|---|---|
FI | 23,308 | social media | 225 | Mikels |
Flickr | 484,258 | social media | - | Ekman |
TwitterI | 1269 | social media | 5 | Sentiment |
TwitterII | 603 | social media | 9 | Sentiment |
EmotionROI | 1980 | social media | 432 | Ekman |
Dataset | Learning Rate | Drop Factor | Croped Size | Momentum | Optimizer |
---|---|---|---|---|---|
FI | 0.01 | 20 | 224 × 224 | 0.9 | SGD |
Flickr | 0.01 | 5 | 224 × 224 | 0.9 | SGD |
TwitterI | 0.02 | 30 | 224 × 224 | 0.9 | SGD |
TwitterII | 0.03 | 20 | 224 × 224 | 0.9 | SGD |
EmotionROI | 0.03 | 30 | 224 × 224 | 0.9 | SGD |
Method | FI | Flickr | Twitter I | Twitter II | EmotionROI | ||
---|---|---|---|---|---|---|---|
Twitter I-5 | Twitter I-4 | Twitter I-3 | |||||
GCH | - | - | 67.91 | 97.20 | 65.41 | 77.68 | 66.53 |
LCH | - | - | 70.18 | 68.54 | 65.93 | 75.98 | 64.29 |
SentiBank | - | - | 71.32 | 68.28 | 66.63 | 65.93 | 66.18 |
DeepSentiBank | 61.54 | 57.83 | 76.35 | 70.15 | 71.25 | 70.23 | 70.11 |
VGGNet [27] | 70.64 | 61.28 | 83.44 | 78.67 | 75.49 | 71.79 | 72.25 |
PCNN | 75.34 | 70.48 | 82.54 | 76.50 | 76.36 | 77.68 | 73.58 |
Yang [9] | 86.35 | 71.13 | 88.65 | 85.10 | 81.06 | 80.48 | 81.26 |
Ours-single | 88.12 | 72.31 | 89.24 | 85.19 | 81.25 | 80.59 | 83.62 |
Wu [8] | 88.84 | 72.39 | 89.50 | 86.97 | 81.65 | 80.97 | 83.04 |
Ours-ensemble | 88.71 | 73.11 | 89.65 | 84.48 | 81.72 | 82.68 | 84.29 |
Method | FI | Flickr | Twitter I | Twitter II | EmotionROI | ||
---|---|---|---|---|---|---|---|
Twitter I-5 | Twitter I-4 | Twitter I-3 | |||||
Fine-tuned VGGNet | 83.05 | 70.12 | 84.35 | 82.26 | 76.75 | 76.99 | 77.02 |
Ours-single | 88.12 | 72.31 | 89.24 | 85.19 | 81.25 | 80.59 | 83.62 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, L.; Zhang, H.; Deng, S.; Shi, G.; Liu, X. Discovering Sentimental Interaction via Graph Convolutional Network for Visual Sentiment Prediction. Appl. Sci. 2021, 11, 1404. https://doi.org/10.3390/app11041404
Wu L, Zhang H, Deng S, Shi G, Liu X. Discovering Sentimental Interaction via Graph Convolutional Network for Visual Sentiment Prediction. Applied Sciences. 2021; 11(4):1404. https://doi.org/10.3390/app11041404
Chicago/Turabian StyleWu, Lifang, Heng Zhang, Sinuo Deng, Ge Shi, and Xu Liu. 2021. "Discovering Sentimental Interaction via Graph Convolutional Network for Visual Sentiment Prediction" Applied Sciences 11, no. 4: 1404. https://doi.org/10.3390/app11041404
APA StyleWu, L., Zhang, H., Deng, S., Shi, G., & Liu, X. (2021). Discovering Sentimental Interaction via Graph Convolutional Network for Visual Sentiment Prediction. Applied Sciences, 11(4), 1404. https://doi.org/10.3390/app11041404