Transferring Sentiment Cross-Lingually within and across Same-Family Languages
Abstract
:1. Introduction
2. Research Questions and Hypotheses
- A cross-lingual transfer is more successful for typologically similar languages than for typologically different languages.
- A large annotated dataset in a distant-family language can overcome typological differences, unlike a small annotated dataset in a close-family language.
- Initially, we propose a framework for unified deep learning that utilizes existing data labels from high-resource languages on low-resource datasets. We conduct rigorous experiments on languages within the same language family. We investigate how effectively sentiment classification abilities could be transferred.
- Second, we demonstrate that, given multiple large-scale training datasets, our framework is superior to a straightforward setup for fine-tuning. Finally, we devise the optimal method for jointly training sentiment analysis systems in order to address the issue of insufficient resources for target languages.
3. Languages in the Study
4. Related Work
4.1. Sentiment Analysis
4.2. Sentiment Analysis in Slavic Languages
4.3. Cross-Lingual Sentiment Analysis
5. Data
Sentiment Analysis Datasets
- Bulgarian: The Cinexio [51] dataset is composed of film reviews with 11-point star ratings: 0 (negative), 0.5, 1, … 4.5, 5 (positive). Other meta-features included in the dataset are film length, director, actors, genre, country, and various scores.
- Croatian: Pauza [68] contains restaurant reviews from Pauza.hr4, the largest food ordering website in Croatia. Each review is assigned an opinion rating ranging from 0.5 (worst) to 6 (best). User-assigned ratings are the benchmark for labels. The dataset also contains opinionated aspects.
- Czech: The CSFD [108] dataset was influenced by Pang et al. [109]. It includes film reviews from the Czech Movie Database (http://www.csfd.cz accessed on 10 September 2023). Every review is classified as either positive, neutral, or negative.
- English: The Multilingual Amazon Reviews Corpus (MARC) is a large collection of Amazon reviews [110]. The corpus contains reviews written in Chinese, English, Japanese, German, French, and Spanish. Each review is assigned a maximum of five stars. Each record contains the review text, the title, the star rating, and product-related metadata.
- Polish: The Wroclaw Corpus of Consumer Reviews Sentiment [77] is a multi-domain dataset of Polish reviews from the domains of schools, medicine, hotels, and products. The texts have been annotated at both the sentence level and the text body level. The reviews are labeled as follows: [+m] represents a strong positive; [+s] represents a weak positive; [−m] represents a strong negative; [−s] represents a weak negative; [amb] represents ambiguity; and [0] represents neutrality.
- Russian: The ROMIP-12 dataset [80] is composed of news-based opinions, which are excerpts of the direct and indirect speech published in news articles. Politics, economics, sports, and the arts are just some of the diverse subject areas covered. This dataset contains speech classified as positive, neutral, or negative.
- Slovak: The Review3 [111] is composed of customer evaluations of a variety of services. The dataset is categorized using the 1–3 and 1–5 scales.
- Slovene: The Opinion corpus of Slovene web commentaries KKS 1.001 [90] includes web commentaries on various topics (business, politics, sports, etc.) from four Slovene web portals (RtvSlo, 24ur, Finance, and Reporter). Each instance within the dataset is tagged with one of the three labels (negative, neutral, or positive).
6. Methodology
- Used directly to train the model. Here, the source language serves as the target language as well (like Bulgarian).
- Combined with a single dataset from a distant language family (like English).
- Combined with a single dataset from a different sub-branch of the same language family (like Russian, Polish, or Czech).
- Merged with a number of low-resource language datasets (Croatian, Slovak, and Slovene).
6.1. Model Details
6.2. Training
- Using only source-language data for fine-tuning. This is the conventional transfer learning setup performed by a source-language fine-tuning classifier. A zero-shot test is administered to the trained model using a test of the target language. We guided the training process using the target language’s validation set. We projected labels from a fine-grained class of 5 classes to a coarse-grained class of 3 classes due to the possibility that the target language dataset labels do not match the source language.
- Fine-tuning with a single source and target language. We sampled training sets from multiple languages and jointly trained the classifier. We utilized datasets from distantly related languages and vice versa.
- Fine-tuning using multiple datasets derived from a single source and target language. This is a multilingual environment with multiple sources.
- Fine-tuning with the Latin versions of the Bulgarian and Russian datasets.
7. Experimental Setup
Training Details
8. Results and Discussion
8.1. Results
8.2. Error Analysis
8.3. Language Representations in XLM-RoBERTa
9. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
ANN | artificial neural network |
BERT | Bidirectional Encoder Representations from Transformers |
CLSA | Cross-lingual sentiment analysis |
IR | Indo-European |
NERC | Named Entity Recognition and Classification |
NLP | Natural language processing |
PLM | Pre-trained language model |
PMI | pointwise mutual information |
SO | Semantic orientation |
SVM | Support vector machines |
QA | Question Answering |
References
- Go, A.; Bhayani, R.; Huang, L. Twitter Sentiment Classification Using Distant Supervision; CS224N Project Report; Stanford University: Stanford, CA, USA, 2009. [Google Scholar]
- Nakov, P.; Rosenthal, S.; Kozareva, Z.; Stoyanov, V.; Ritter, A.; Wilson, T. SemEval-2013 Task 2: Sentiment Analysis in Twitter. In Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, GA, USA, 14–15 June 2013; pp. 14–15. [Google Scholar]
- Saif, H.; Fernández, M.; He, Y.; Alani, H. Evaluation datasets for Twitter sentiment analysis: A survey and a new dataset, the STS-Gold. In Proceedings of the 1st Interantional Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (ESSEM 2013), Turin, Italy, 3 December 2013. [Google Scholar]
- Wilson, T.; Wiebe, J.; Hoffmann, P. Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Comput. Linguist. 2009, 35, 399–433. [Google Scholar] [CrossRef]
- Agarwal, A.; Xie, B.; Vovsha, I.; Rambow, O.; Passonneau, R.J. Sentiment analysis of twitter data. In Proceedings of the Workshop on Language in Social Media (LSM 2011), Portland, OR, USA, 23 June 2011; pp. 30–38. [Google Scholar]
- Wilson, T.; Wiebe, J.; Hoffmann, P. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, BC, Canada, 6–8 October 2005; pp. 347–354. [Google Scholar]
- Socher, R.; Lin, C.C.Y.; Ng, A.Y.; Manning, C.D. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the ICML, Bellevue, WA, USA, 28 June–2 July 2011. [Google Scholar]
- Wan, X. Co-training for cross-lingual sentiment classification. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, 2–7 August 2009; Su, K., Su, J., Wiebe, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2009; pp. 235–243. [Google Scholar]
- Banea, C.; Mihalcea, R.; Wiebe, J.; Hassan, S. Multilingual subjectivity analysis using machine translation. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, HI, USA, 25–27 October 2008; pp. 127–135. [Google Scholar]
- Balahur, A.; Turchi, M. Multilingual sentiment analysis using machine translation? In Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis, Jeju, Republic of Korea, 12 July 2012; pp. 52–60. [Google Scholar]
- A.R., B.; Joshi, A.; Bhattacharyya, P. Cross-Lingual Sentiment Analysis for Indian Languages using Linked WordNets. In Proceedings of the COLING 2012: Posters, Mumbai, India, 8–15 December 2012; pp. 73–82. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep contextualized word representations. In Proceedings of the NAACL-HLT, New Orleans, LA, USA, 1–6 June 2018; pp. 2227–2237. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Grancharova, M.; Dalianis, H. Applying and Sharing pre-trained BERT-models for Named Entity Recognition and Classification in Swedish Electronic Patient Records. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), Reykjavik, Iceland, 31 May–2 June 2021; pp. 231–239. [Google Scholar]
- Wang, Z.; Ng, P.; Ma, X.; Nallapati, R.; Xiang, B. Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5878–5882. [Google Scholar] [CrossRef]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems, Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
- Jiang, H.; He, P.; Chen, W.; Liu, X.; Gao, J.; Zhao, T. SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 6–8 July 2020; pp. 2177–2190. [Google Scholar] [CrossRef]
- Das, A.; Sarkar, S. A survey of the model transfer approaches to cross-lingual dependency parsing. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 2020, 19, 67. [Google Scholar] [CrossRef]
- Chen, X.; Awadallah, A.H.; Hassan, H.; Wang, W.; Cardie, C. Multi-Source Cross-Lingual Model Transfer: Learning What to Share. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 3098–3112. [Google Scholar] [CrossRef]
- Kandula, H.; Min, B. Improving Cross-Lingual Sentiment Analysis via Conditional Language Adversarial Nets. In Proceedings of the Third Workshop on Computational Typology and Multilingual NLP, Online, 10 June 2021; pp. 32–37. [Google Scholar] [CrossRef]
- Pintu Lohar, M.P.; Way, A. Building English-to-Serbian Machine Translation System for IMDb Movie Reviews. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing BSNLP 2019, Florence, Italy, 2 August 2019; pp. 105–113. [Google Scholar]
- Chen, X.; Sun, Y.; Athiwaratkun, B.; Cardie, C.; Weinberger, K. Adversarial deep averaging networks for cross-lingual sentiment classification. Trans. Assoc. Comput. Linguist. 2018, 6, 557–570. [Google Scholar] [CrossRef]
- Crystal, D. A Dictionary of Linguistics and Phonetics; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Sussex, R.; Cubberley, P. The Slavic Languages; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
- Golubović, J.; Gooskens, C. Mutual Intelligibility between West and South Slavic Languages. Russ. Linguist. 2015, 39, 351–373. [Google Scholar] [CrossRef]
- Townsend, C.E.; Janda, L.A. Common and Comparative Slavic: Phonology and Inflection: With Special Attention to Russian, Polish, Czech, Serbo-Croatian, Bulgarian; Slavica Pub: Bloomington, IN, USA, 1996. [Google Scholar]
- Turney, P. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002; pp. 417–424. [Google Scholar] [CrossRef]
- Kim, S.M.; Hovy, E. Determining the sentiment of opinions. In Proceedings of the COLING 2004: 20th International Conference on Computational Linguistics, Genewa, Switzerland, 23–27 August 2004; pp. 1367–1373. [Google Scholar]
- Polanyi, L.; Zaenen, A. Contextual valence shifters. In Computing Attitude and Affect in Text: Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–10. [Google Scholar]
- Riloff, E.; Wiebe, J. Learning extraction patterns for subjective expressions. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Sapporo, Japan, 11–12 July 2003; pp. 105–112. [Google Scholar]
- Esuli, A.; Sebastiani, F. SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy, 22–28 May 2006. [Google Scholar]
- Stone, P.J.; Hunt, E.B. A computer approach to content analysis: Studies using the general inquirer system. In Proceedings of the Spring Joint Computer Conference, Detroit, MI, USA, 21–23 May 1963; pp. 241–256. [Google Scholar]
- Cambria, E.; Speer, R.; Havasi, C.; Hussain, A. Senticnet: A publicly available semantic resource for opinion mining. In Proceedings of the 2010 AAAI Fall Symposium Series, Arlington, VA, USA, 11–13 November 2010. [Google Scholar]
- Nielsen, F.Å. A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. In Proceedings of the Workshop on ‘Making Sense of Microposts’: Big Things Come in Small Packages, Heraklion, Crete, Greece, 30 May 2011; pp. 93–98. [Google Scholar]
- Mullen, T.; Collier, N. Sentiment analysis using support vector machines with diverse information sources. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25–26 July 2004; pp. 412–418. [Google Scholar]
- McDonald, R.; Hannan, K.; Neylon, T.; Wells, M.; Reynar, J. Structured Models for Fine-to-Coarse Sentiment Analysis. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, 25–27 June 2007; pp. 432–439. [Google Scholar]
- Paulus, R.; Socher, R.; Manning, C.D. Global Belief Recursive Neural Networks. In Advances in Neural Information Processing Systems, Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2014; Volume 27. [Google Scholar]
- Read, J.; Carroll, J. Weakly Supervised Techniques for Domain-Independent Sentiment Classification. In Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion (TSA ’09), New York, NY, USA, 6 November 2009; pp. 45–52. [Google Scholar] [CrossRef]
- Moraes, R.; Valiati, J.a.F.; GaviãO Neto, W.P. Document-Level Sentiment Classification: An Empirical Comparison between SVM and ANN. Expert Syst. Appl. 2013, 40, 621–633. [Google Scholar] [CrossRef]
- Huang, E.H.; Socher, R.; Manning, C.D.; Ng, A.Y. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jeju Island, Republic of Korea, 8–14 July 2012; pp. 873–882. [Google Scholar]
- Socher, R.; Huval, B.; Manning, C.D.; Ng, A.Y. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Republic of Korea, 8–14 July 2012; pp. 1201–1211. [Google Scholar]
- Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning, C.D.; Ng, A.Y.; Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; pp. 1631–1642. [Google Scholar]
- Kalchbrenner, N.; Grefenstette, E.; Blunsom, P. A Convolutional Neural Network for Modelling Sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA, 22–27 June 2014; pp. 655–665. [Google Scholar] [CrossRef]
- Kim, Y. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1746–1751. [Google Scholar] [CrossRef]
- Wang, X.; Liu, Y.; Sun, C.; Wang, B.; Wang, X. Predicting Polarities of Tweets by Composing Word Embeddings with Long Short-Term Memory. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26–31 July 2015; pp. 1343–1353. [Google Scholar] [CrossRef]
- Dong, L.; Wei, F.; Tan, C.; Tang, D.; Zhou, M.; Xu, K. Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, MD, USA, 22–27 June 2014; pp. 49–54. [Google Scholar] [CrossRef]
- Wang, J.; Yu, L.C.; Lai, K.R.; Zhang, X. Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany, 7–12 August 2016; pp. 225–230. [Google Scholar] [CrossRef]
- Wang, X.; Jiang, W.; Luo, Z. Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts. In Proceedings of the Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 11–16 December 2016; pp. 2428–2437. [Google Scholar]
- Kapukaranov, B.; Nakov, P. Fine-grained sentiment analysis for movie reviews in Bulgarian. In Proceedings of the International Conference Recent Advances in Natural Language Processing, Hissar, Bulgaria, 7–9 September 2015; pp. 266–274. [Google Scholar]
- Georgieva-Trifonova, T.; Stefanova, M.; Kalchev, S. Customer Feedback Text Analysis for Online Stores Reviews in Bulgarian. IAENG Int. J. Comput. Sci. 2018, 45, 560–568. [Google Scholar]
- Lazarova, G.; Koychev, I. Semi-supervised multi-view sentiment analysis. In Computational Collective Intelligence; Springer: Berlin/Heidelberg, Germany, 2015; pp. 181–190. [Google Scholar]
- Osenova, P.; Simov, K.I. The Political Speech Corpus of Bulgarian. In Proceedings of the LREC, Online, 21–27 May 2012; pp. 1744–1747. [Google Scholar]
- Smailović, J.; Kranjc, J.; Grčar, M.; Žnidaršič, M.; Mozetič, I. Monitoring the Twitter sentiment during the Bulgarian elections. In Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Paris, France, 19–21 October 2015; pp. 1–10. [Google Scholar]
- Hristova, G. Text Analytics in Bulgarian: An Overview and Future Directions. Cybern. Inf. Technol. 2021, 21, 3–23. [Google Scholar] [CrossRef]
- Steinberger, J.; Ebrahim, M.; Ehrmann, M.; Hurriyetoglu, A.; Kabadjov, M.; Lenkova, P.; Steinberger, R.; Tanev, H.; Vázquez, S.; Zavarella, V. Creating sentiment dictionaries via triangulation. Decis. Support Syst. 2012, 53, 689–694. [Google Scholar] [CrossRef]
- Veselovská, K. Sentence-level sentiment analysis in czech. In Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics, Craiova, Romania, 13–15 June 2012; pp. 1–4. [Google Scholar]
- Habernal, I.; Brychcín, T. Semantic spaces for sentiment analysis. In Proceedings of the International Conference on Text, Speech and Dialogue, Pilsen, Czech Republic, 1–5 September 2013; pp. 484–491. [Google Scholar]
- Çano, E.; Bojar, O. Sentiment Analysis of Czech Texts: An Algorithmic Survey. In Proceedings of the 11th International Conference on Agents and Artificial Intelligence, Prague, Czech Republic, 19–21 February 2019; Rocha, A.P., Steels, L., van den Herik, H.J., Eds.; SciTePress: Setúbal, Portugal, 2019; pp. 973–979. [Google Scholar] [CrossRef]
- Klouda, I.K.; Langr, L.; Ing, D.V. Product Review Sentiment Analysis in the Czech Language Student. Bachelor’s Thesis, Czech Technical University in Prague, Prague, Czech Republic, 2019. [Google Scholar]
- Sido, J.; Prazák, O.; Pribán, P.; Pasek, J.; Seják, M.; Konopík, M. Czert—Czech BERT-like Model for Language Representation. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), Online, 1–3 September 2021; Angelova, G., Kunilovskaya, M., Mitkov, R., Nikolova-Koleva, I., Eds.; INCOMA Ltd.: Hyderabad, India, 2021; pp. 1326–1338. [Google Scholar]
- Straka, M.; Náplava, J.; Straková, J.; Samuel, D. RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model. arXiv 2021, arXiv:2105.11314. [Google Scholar]
- Vysušilová, P.; Straka, M. Sentiment Analysis (Czech Model). LINDAT/CLARIAH-CZ Digital Library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. 2021. Available online: http://hdl.handle.net/11234/1-4601 (accessed on 23 April 2023).
- Agić, Ž.; Ljubešić, N.; Tadić, M. Towards sentiment analysis of financial texts in croatian. Bull Mark. 2010, 143, 69. [Google Scholar]
- Agic, Z.; Merkler, D. Rule-Based Sentiment Analysis in Narrow Domain: Detecting Sentiment in Daily Horoscopes Using Sentiscope. In Proceedings of the 2nd Workshop on Sentiment Analysis Where AI Meets Psychology, Mumbai, India, 15 December 2012; pp. 115–124. [Google Scholar]
- Jakopović, H.; Mikelić Preradović, N. Identifikacija Online Imidža Organizacija Temeljem Analize Sentimenata Korisnički Generiranog Sadržaja na Hrvatskim Portalima. Med. Istraž. 2016, 22, 63–82. [Google Scholar] [CrossRef]
- Glavaš, G.; Korenčić, D.; Šnajder, J. Aspect-oriented opinion mining from user reviews in Croatian. In Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, Sofia, Bulgaria, 8–9 August 2013; pp. 18–23. [Google Scholar]
- Mozetič, I.; Grčar, M.; Smailović, J. Multilingual Twitter sentiment classification: The role of human annotators. PLoS ONE 2016, 11, e0155036. [Google Scholar] [CrossRef] [PubMed]
- Rotim, L.; Šnajder, J. Comparison of short-text sentiment analysis methods for croatian. In Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, Valencia, Spain, 4 April 2017; pp. 69–75. [Google Scholar]
- Robnik-Šikonja, M.; Reba, K.; Mozetič, I. Cross-lingual transfer of sentiment classifiers. Slovenščina 2.0: Empirical. Appl. Interdiscip. Res. 2021, 9, 1–25. [Google Scholar] [CrossRef]
- Lula, P.; Wójcik, K. Sentiment analysis of consumer opinions written in Polish. Econ. Manag. 2011, 16, 1286–1291. [Google Scholar]
- Haniewicz, K.; Rutkowski, W.; Adamczyk, M.; Kaczmarek, M. Towards the lexicon-based sentiment analysis of polish texts: Polarity lexicon. In Proceedings of the International Conference on Computational Collective Intelligence, Craiova, Romania, 11–13 September 2013; pp. 286–295. [Google Scholar]
- Rybiński, K. Political sentiment analysis of press freedom. Stud. Medioznawcze 2018, 2018, 31–48. [Google Scholar] [CrossRef]
- Zaśko-Zielińska, M.; Piasecki, M.; Szpakowicz, S. A large wordnet-based sentiment lexicon for Polish. In Proceedings of the International Conference Recent Advances in Natural Language Processing, Hissar, Bulgaria, 7–9 September 2015; pp. 721–730. [Google Scholar]
- Bartusiak, R.; Augustyniak, L.; Kajdanowicz, T.; Kazienko, P. Sentiment Analysis for Polish Using Transfer Learning Approach. In Proceedings of the 2015 Second European Network Intelligence Conference, Karlskrona, Sweden, 21–22 September 2015; pp. 53–59. [Google Scholar] [CrossRef]
- Kocoń, J.; Zaśko-Zielińska, M.; Miłkowski, P. Multi-level analysis and recognition of the text sentiment on the example of consumer opinions. In Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2019), Varna, Bulgaria, 2–4 September 2019. [Google Scholar]
- Wawer, A.; Sobiczewska, J. Predicting Sentiment of Polish Language Short Texts. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), Varna, Bulgaria, 2–4 September 2019; pp. 1321–1327. [Google Scholar]
- Kuznetsova, E.S.; Loukachevitch, N.V.; Chetviorkin, I.I. Testing rules for a sentiment analysis system. In Proceedings of the Proceedings of International Conference Dialog, Metz, France, 22–24 August 2013; Volume 2, pp. 71–80. [Google Scholar]
- Chetviorkin, I.; Loukachevitch, N.V. Evaluating Sentiment Analysis Systems in Russian. In Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, BSNLP@ACL 2013, Sofia, Bulgaria, 8–9 August 2013; Piskorski, J., Pivovarova, L., Tanev, H., Yangarber, R., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2013; pp. 12–17. [Google Scholar]
- Golubev, A.; Loukachevitch, N.V. Improving Results on Russian Sentiment Datasets. arXiv 2020, arXiv:2007.14310. [Google Scholar]
- Golubev, A.; Loukachevitch, N.V. Transfer Learning for Improving Results on Russian Sentiment Datasets. arXiv 2021, arXiv:2107.02499. [Google Scholar]
- Smetanin, S.; Komarov, M. Deep transfer learning baselines for sentiment analysis in Russian. Inf. Process. Manag. 2021, 58, 102484. [Google Scholar] [CrossRef]
- Machová, K.; Mikula, M.; Gao, X.; Mach, M. Lexicon-based Sentiment Analysis Using the Particle Swarm Optimization. Electronics 2020, 9, 1317. [Google Scholar] [CrossRef]
- Bučar, J.; Povh, J.; Žnidaršič, M. Sentiment classification of the Slovenian news texts. In Proceedings of the 9th International Conference on Computer Recognition Systems CORES, Wroclaw, Poland, 25–27 May 2015; Springer: Berlin/Heidelberg, Germany, 2016; pp. 777–787. [Google Scholar]
- Bučar, J. Manually Sentiment Annotated Slovenian News Corpus SentiNews 1.0. Slovenian Language Resource Repository CLARIN.SI. 2017. Available online: http://hdl.handle.net/11356/1110 (accessed on 23 April 2023).
- Žitnik, S. Slovene Corpus for Aspect-Based Sentiment Analysis—SentiCoref 1.0. Slovenian Language Resource Repository CLARIN.SI. 2019. Available online: https://www.clarin.si/repository/xmlui/handle/11356/1285 (accessed on 21 April 2023).
- Pelicon, A.; Pranjić, M.; Miljković, D.; Škrlj, B.; Pollak, S. Sentiment Annotated Dataset of Croatian News. Slovenian Language Resource Repository CLARIN.SI. 2020. Available online: https://www.clarin.si/repository/xmlui/handle/11356/1342 (accessed on 20 April 2023).
- Pelicon, A.; Pranjic, M.; Miljković, D.; Škrlj, B.; Pollak, S. Zero-Shot Learning for Cross-Lingual News Sentiment Classification. Appl. Sci. 2020, 10, 5993. [Google Scholar] [CrossRef]
- Kadunc, K.; Robnik-Šikonja, M. Opinion corpus of Slovene Web Commentaries KKS 1.001. Slovenian Language Resource Repository CLARIN.SI. 2017. Available online: https://www.clarin.si/repository/xmlui/handle/11356/1115 (accessed on 20 April 2023).
- Ljubešić, N.; Fišer, D.; Erjavec, T.; Šulc, A. Offensive language dataset of Croatian, English and Slovenian comments FRENK 1.1. Slovenian Language Resource Repository CLARIN.SI. 2021. Available online: https://www.clarin.si/repository/xmlui/handle/11356/1462 (accessed on 20 April 2023).
- Evkoski, B.; Pelicon, A.; Mozetič, I.; Ljubešić, N.; Kralj Novak, P. Slovenian Twitter Dataset 2018–2020 1.0. Slovenian language resource repository CLARIN.SI. 2021. Available online: https://www.clarin.si/repository/xmlui/handle/11356/1423 (accessed on 20 April 2023).
- Cotterell, R.; Heigold, G. Cross-lingual Character-Level Neural Morphological Tagging. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, 9–11 September 2017; Palmer, M., Hwa, R., Riedel, S., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 748–759. [Google Scholar] [CrossRef]
- Lin, Y.; Chen, C.; Lee, J.; Li, Z.; Zhang, Y.; Xia, M.; Rijhwani, S.; He, J.; Zhang, Z.; Ma, X.; et al. Choosing Transfer Languages for Cross-Lingual Learning. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 28 July–2 August 2019; Korhonen, A., Traum, D.R., Màrquez, L., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; Volume 1: Long Papers, pp. 3125–3135. [Google Scholar] [CrossRef]
- Mihalcea, R.; Banea, C.; Wiebe, J. Learning Multilingual Subjective Language via Cross-Lingual Projections. In Proceedings of the ACL 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, 23–30 June 2007; Carroll, J.A., van den Bosch, A., Zaenen, A., Eds.; The Association for Computational Linguistics: Stroudsburg, PA, USA, 2007. [Google Scholar]
- Feng, Y.; Wan, X. Towards a unified end-to-end approach for fully unsupervised cross-lingual sentiment analysis. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), Hong Kong, China, 3–4 November 2019; pp. 1035–1044. [Google Scholar]
- Kanayama, H.; Nasukawa, T.; Watanabe, H. Deeper Sentiment Analysis Using Machine Translation Technology. In Proceedings of the COLING 2004: 20th International Conference on Computational Linguistics, Geneva, Switzerland, 23–27 August 2004; pp. 494–500. [Google Scholar]
- Galeshchuk, S.; Qiu, J.; Jourdan, J. Sentiment Analysis for Multilingual Corpora. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, BSNLP@ACL 2019, Florence, Italy, 2 August 2019; Erjavec, T., Marcinczuk, M., Nakov, P., Piskorski, J., Pivovarova, L., Snajder, J., Steinberger, J., Yangarber, R., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 120–125. [Google Scholar] [CrossRef]
- Lohar, P.; Afli, H.; Way, A. Maintaining Sentiment Polarity of Translated User Generated Content. Prague Bull. Math. Linguist. 2017, 108, 73–84. [Google Scholar] [CrossRef]
- Lohar, P.; Afli, H.; Way, A. Balancing Translation Quality and Sentiment Preservation. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas, Boston, MA, USA, 17–21 March 2018; pp. 81–88. [Google Scholar]
- Vulic, I.; Moens, M. Cross-Lingual Semantic Similarity of Words as the Similarity of Their Semantic Word Responses. In Proceedings of the 2013 Conference of the North American Chapter of the Association of Computational Linguistics: Human Language Technologies, Atlanta, GA, USA, 9–14 June 2013; Vanderwende, L., Daumé, H., III, Kirchhoff, K., Eds.; The Association for Computational Linguistics: Stroudsburg, PA, USA, 2013; pp. 106–116. [Google Scholar]
- Conneau, A.; Wu, S.; Li, H.; Zettlemoyer, L.; Stoyanov, V. Emerging Cross-lingual Structure in Pretrained Language Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 5–10 July 2020; Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 6022–6034. [Google Scholar] [CrossRef]
- Li, Z.; Zhang, Y.; Wei, Y.; Wu, Y.; Yang, Q. End-to-End Adversarial Memory Network for Cross-domain Sentiment Classification. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, 19–25 August 2017; Sierra, C., Ed.; AAAI Press: Washington, DC, USA, 2017; pp. 2237–2243. [Google Scholar] [CrossRef]
- Long, M.; CAO, Z.; Wang, J.; Jordan, M.I. Conditional Adversarial Domain Adaptation. In Advances in Neural Information Processing Systems, Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
- Fei, H.; Li, P. Cross-lingual unsupervised sentiment classification with multi-view transfer learning. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 5759–5771. [Google Scholar]
- Dong, D.; Wu, H.; He, W.; Yu, D.; Wang, H. Multi-task learning for multiple language translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26–31 July 2015; pp. 1723–1732. [Google Scholar]
- Johnson, M.; Schuster, M.; Le, Q.V.; Krikun, M.; Wu, Y.; Chen, Z.; Thorat, N.; Viégas, F.; Wattenberg, M.; Corrado, G.; et al. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 2017, 5, 339–351. [Google Scholar] [CrossRef]
- Habernal, I.; Ptáček, T.; Steinberger, J. Sentiment analysis in czech social media using supervised machine learning. In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Atlanta, GA, USA, 14 June 2013; pp. 65–74. [Google Scholar]
- Pang, B.; Lee, L.; Vaithyanathan, S. Thumbs up? Sentiment Classification using Machine Learning Techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), Philadelphia, PA, USA, 6–7 July 2002; Association for Computational Linguistics: Stroudsburg, PA, USA, 2002; pp. 79–86. [Google Scholar] [CrossRef]
- Keung, P.; Lu, Y.; Szarvas, G.; Smith, N.A. The Multilingual Amazon Reviews Corpus. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 4563–4568. [Google Scholar] [CrossRef]
- Pecar, S.; Simko, M.; Bielikova, M. Improving Sentiment Classification in Slovak Language. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, Florence, Italy, 29 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019. [Google Scholar]
- McDonald, R.; Petrov, S.; Hall, K. Multi-Source Transfer of Delexicalized Dependency Parsers. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, 27–31 July 2011; pp. 62–72. [Google Scholar]
- Thongtan, T.; Phienthrakul, T. Sentiment Classification Using Document Embeddings Trained with Cosine Similarity. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, Florence, Italy, 28 July–2 August 2019; pp. 407–414. [Google Scholar] [CrossRef]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; pp. 38–45. [Google Scholar] [CrossRef]
- Wu, Z.; Saito, S. HiNet: Hierarchical Classification with Neural Network. arXiv 2017, arXiv:1705.11105. [Google Scholar]
- Thakkar, G.; Preradovic, N.M.; Tadic, M. Multi-task Learning for Cross-Lingual Sentiment Analysis. In Proceedings of the 2nd International Workshop on Cross-Lingual Event-Centric Open Analytics Co-Located with the 30th The Web Conference (WWW 2021), Ljubljana, Slovenia, 12 April 2021; Demidova, E., Hakimov, S., Winters, J., Tadic, M., Eds.; Sun SITE Central Europe: Aachen, Germany, 2021; Volume 2829, CEUR Workshop Proceedings. pp. 76–84. [Google Scholar]
- Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P.P. Natural Language Processing (Almost) from Scratch. J. Mach. Learn. Res. 2011, 12, 2493–2537. [Google Scholar]
- Dror, R.; Shlomov, S.; Reichart, R. Deep Dominance—How to Properly Compare Deep Neural Models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Korhonen, A., Traum, D., Màrquez, L., Eds.; pp. 2773–2785. [Google Scholar] [CrossRef]
- Ulmer, D.; Hardmeier, C.; Frellsen, J. deep-significance-Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks. arXiv 2022, arXiv:2204.06815. [Google Scholar]
- Del Barrio, E.; Cuesta-Albertos, J.A.; Matrán, C. An optimal transportation approach for assessing almost stochastic order. In The Mathematics of the Uncertain; Springer: Berlin/Heidelberg, Germany, 2018; pp. 33–44. [Google Scholar]
- Yeh, A. More accurate tests for the statistical significance of result differences. In Proceedings of the COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics, Saarbrücken, Germany, 31 July–4 August 2000. [Google Scholar]
- Přibáň, P.; Steinberger, J. Are the Multilingual Models Better? Improving Czech Sentiment with Transformers. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), Online, 1–3 September 2021; pp. 1138–1149. [Google Scholar]
- Pikuliak, M.; Grivalský, Š.; Konôpka, M.; Blšták, M.; Tamajka, M.; Bachratý, V.; Simko, M.; Balážik, P.; Trnka, M.; Uhlárik, F. SlovakBERT: Slovak Masked Language Model. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; Goldberg, Y., Kozareva, Z., Zhang, Y., Eds.; pp. 7156–7168. [Google Scholar] [CrossRef]
- Přibáň, P.; Šmíd, J.; Steinberger, J.; Mištera, A. A comparative study of cross-lingual sentiment analysis. Expert Syst. Appl. 2024, 247, 123247. [Google Scholar] [CrossRef]
Language | Dataset | Train | Validate | Test |
---|---|---|---|---|
Bulgarian | Cinexio | 5520 | 614 | 682 |
Croatian | Pauza | 2277 | 1033 | |
Czech | CSFD | 63,966 | 13,707 | 13,707 |
English | MARC | 200,000 | 5000 | 5000 |
Polish | all2 | 28,581 | 3572 | 3572 |
Russian | ROIMP 2012 | 4000 | 260 | 5500 |
Slovak | Reviews3 | 3834 | 661 | 1235 |
Slovene | KKS | 3977 | 200 | 600 |
Source Languages | |||
---|---|---|---|
1st | 2nd | 3rd | 4th |
Bulgarian | English | ||
Croatian | English | ||
Czech | English | ||
Polish | English | ||
Russian | English | ||
Slovak | English | ||
Slovene | English | ||
Bulgarian | Russian | ||
Croatian | Russian | ||
Czech | Russian | ||
Polish | Russian | ||
Slovak | Russian | ||
Slovene | Russian | ||
Bulgarian | |||
Croatian | |||
Czech | |||
Polish | |||
Russian | |||
Slovak | |||
Slovene | |||
Croatian | Slovene | ||
Croatian | Slovene | Slovak | |
Croatian | Slovene | Slovak | Bulgarian |
Czech | Bulgarian | ||
Czech | Croatian | ||
Czech | Slovak | ||
Czech | Slovene | ||
Polish | Bulgarian | ||
Polish | Croatian | ||
Polish | Slovak | ||
Polish | Slovene | ||
Bulgarian | Croatian | ||
Bulgarian | Slovak | ||
Bulgarian | Slovene |
Source Languages | ||
---|---|---|
1st | 2nd | 3rd |
Bulgarian (Latin) | ||
Russian (Latin) | ||
Bulgarian (Latin) | Russian (Latin) | |
Russian (Latin) | Croatian | |
Bulgarian (Latin) | Croatian | |
Russian (Latin) | Slovak | |
Russian (Latin) | Slovene | |
Bulgarian (Latin) | English | |
Russian (Latin) | English | |
Bulgarian (Latin) | Polish | |
Russian (Latin) | Polish | |
Bulgarian (Latin) | Czech | |
Russian (Latin) | Czech | |
Bulgarian (Latin) | Slovene | Slovak |
Russian (Latin) | Slovene | Slovak |
Language | Acc-3 | F1-3 |
---|---|---|
Bulgarian | 67.80 (0.0076) | 69.42 (0.0046) |
Croatian | 62.37 (0.004) | 57.47 (0.0053) |
Czech | 83.82 (0.0037) | 83.76 * (0.0033) |
English | 68.15 (0.0076) | 67.85 (0.0100) |
Polish | 87.70 (0.0033) | 87.57 * (0.0039) |
Russian | 71.43 (0.0013) | 70.20 (0.0030) |
Slovak | 81.60 (0.0057) | 79.75 (0.0017) |
Slovene | 59.13 (0.0180) | 59.97 (0.0307) |
Target Language | Source Languages | 5-Class Accuracy | 5-Class F1 | 3-Class Accuracy | 3-Class F1 |
---|---|---|---|---|---|
Bulgarian | Bulgarian English | 53.37 (0.0123) | 54.60 * (0.0097) | 72.73 (0.0142) | 74.22 * (0.7422) |
Bulgarian | Bulgarian Czech | 52.18 (0.0070) | 53.14 (0.0106) | 72.79 (0.0098) | 74.11 * (0.0081) |
Croatian | Croatian English | 54.12 * (0.0186) | 53.80 * (0.0163) | 74.07 (0.0121) | 74.12 (0.0097) |
Croatian | Croatian Czech | 50.88 (0.0094) | 50.12 (0.0251) | 74.69 (0.0107) | 75.82 * (0.0106) |
Czech | Czech Croatian | 82.29 (0.0035) | 82.24 (0.0036) | ||
English | Czech English | 56.22 (0.0099) | 55.36 (0.0123) | 69.09 (0.0035) | 69.06 * (0.0043) |
English | Bulgarian (Latin) English | 56.91 (0.0031) | 56.78 (0.0042) | 68.36 (0.0086) | 68.05 (0.0103) |
Polish | Bulgarian (Latin) Polish | 52.34 (0.0017) | 52.28 (0.0012) | 87.05 (0.0028) | 87.15 * (0.0016) |
Polish | Russian (Latin) Polish | 52.19 (0.0010) | 52.15 (0.0005) | 86.92 (0.0016) | 87.00 * (0.0007) |
Russian | Bulgarian Russian | 71.84 (0.0035) | 71.31 (0.0022) | ||
Slovak | Slovak English | 68.87 (0.0351) | 68.03 * (0.016) | 83.51 (0.0182) | 82.14 (0.0076) |
Slovak | Slovak Croatian Slovene | 64.47 (0.0135) | 58.71 (0.0441) | 85.36 (0.0046) | 83.44 * (0.0064) |
Slovene | Slovene English | 69.52 * (0.0203) | 68.97 * (0.0154) | ||
Slovene | Slovene Czech | 68.24 * (0.0084) | 69.56 * (0.0078) | ||
Bulgarian (Latin) | Bulgarian (Latin) English | 50.73 (0.0094) | 51.76 (0.0075) | 70.30 (0.0093) | 72.01 (0.0071) |
Russian (Latin) | Russian (Latin) English | 88.14 * (0.0299) | 87.95 * (0.0290) |
Language | Metric | 5-Class | 3-Class | 2-Class |
---|---|---|---|---|
Bulgarian [51] | MSE | 0.666 | 0.141 | |
Croatian [68] | F1 | 91.1 | ||
Czech [122] | F1 | 87.08 ± 0.11 | 96.00 ± 0.02 | |
English [110] | ACC | 56.5 | ||
Russian [82] | F1 | 72.69 | 87.04 | |
Slovak [123] (http://arl6.library.sk/nlp4sk/webapi/analyza-sentimentu, accessed on (23 April 2023)) | F1 | 81.5 | ||
Slovene [90] | F1 | 65.7 |
Language | Size (GB) | Tokens (Million) |
---|---|---|
Bulgarian | 57.5 | 5487 |
Croatian | 20.5 | 3297 |
Czech | 16.3 | 2498 |
English | 300.8 | 55,608 |
Polish | 44.6 | 6490 |
Russian | 278.0 | 23,408 |
Slovak | 23.2 | 3525 |
Slovene | 10.3 | 1669 |
Languages | Hr | Cs | Pl | Ru | Sk | Bg Latin | Ru Latin | Sv | En |
---|---|---|---|---|---|---|---|---|---|
Bulgarian | 130 | 235 | 90 | 2919 | 123 | 261 | 126 | 122 | 215 |
Croatian | 5075 | 2881 | 432 | 2215 | 1778 | 3014 | 4420 | 3256 | |
Czech | 9656 | 1300 | 6035 | 3573 | 8733 | 10,075 | 15,122 | ||
Polish | 690 | 2927 | 2207 | 5075 | 5417 | 6931 | |||
Russian | 371 | 314 | 1522 | 733 | 1207 | ||||
Slovak | 1616 | 2923 | 3412 | 2774 | |||||
Bulgarian (Latin) | 2689 | 2655 | 2416 | ||||||
Russian (Latin) | 5799 | 5702 | |||||||
Slovene | 6352 | ||||||||
English |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Thakkar, G.; Preradović, N.M.; Tadić, M. Transferring Sentiment Cross-Lingually within and across Same-Family Languages. Appl. Sci. 2024, 14, 5652. https://doi.org/10.3390/app14135652
Thakkar G, Preradović NM, Tadić M. Transferring Sentiment Cross-Lingually within and across Same-Family Languages. Applied Sciences. 2024; 14(13):5652. https://doi.org/10.3390/app14135652
Chicago/Turabian StyleThakkar, Gaurish, Nives Mikelić Preradović, and Marko Tadić. 2024. "Transferring Sentiment Cross-Lingually within and across Same-Family Languages" Applied Sciences 14, no. 13: 5652. https://doi.org/10.3390/app14135652
APA StyleThakkar, G., Preradović, N. M., & Tadić, M. (2024). Transferring Sentiment Cross-Lingually within and across Same-Family Languages. Applied Sciences, 14(13), 5652. https://doi.org/10.3390/app14135652