What Is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models
Abstract
:1. Introduction
- We create multilingual gender lexicons to detect sentences with gendered words in Chinese, English, German, Portuguese, and Spanish without relying on parallel datasets, which enables us to extract more diverse sets of gendered sentences and facilitate more robust evaluations.
- We present two novel metrics and methods that provide a rigorous approach to comparing multilingual MLMs and datasets. Our approach ensures meaningful and fair comparisons, leading to more reliable and comprehensive assessments of gender bias in multilingual MLMs.
2. Related Work
3. Methodology
3.1. Multilingual Bias Evaluation
3.2. Strict Bias Metric
3.3. Lexicon-Based Sentence Generation
3.4. Model-Based Sentence Generation
3.5. Direct Comparison Bias Metric
4. Data Preparation
4.1. Multilingual Gender Lexicon
4.2. MGL Validation
4.3. Sentence Pair Generation
5. Experiments
5.1. Multilingual Bias Evaluation on Kaneko*
5.2. Strict Bias Metric for LSG and MSG
5.3. Direct Comparison Bias Metric for MSG
6. Analysis
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
NLP | Natural Language Processing. |
MLMs | Masked Language Models. |
MBE | Multlingual Bias Evaluation. |
AULA | All Unmasked Likelihood with Attention. |
SBM | Strict Bias Metric. |
LSG | Lexicon-based Sentence Generation. |
MSG | Model-based Sentence Generation. |
DBM | Direct Comparison Bias Metric. |
MGL | Multilingual Gender Lexicon. |
References
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.u.; Polosukhin, I. Attention is All you Need. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Hartvigsen, T.; Gabriel, S.; Palangi, H.; Sap, M.; Ray, D.; Kamar, E. ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection. In Proceedings of the ACL 2022, Dublin, Ireland, 22–27 May 2022. [Google Scholar]
- Bender, E.M.; Friedman, B. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Trans. Assoc. Comput. Linguist. 2018, 6, 587–604. [Google Scholar] [CrossRef]
- Dixon, L.; Li, J.; Sorensen, J.; Thain, N.; Vasserman, L. Measuring and Mitigating Unintended Bias in Text Classification. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, New Orleans, LA, USA, 2–3 February 2018; pp. 67–73. [Google Scholar]
- Hutchinson, B.; Prabhakaran, V.; Denton, E.; Webster, K.; Zhong, Y.; Denuyl, S. Social Biases in NLP Models as Barriers for Persons with Disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 5491–5501. [Google Scholar] [CrossRef]
- Clark, K.; Luong, M.T.; Le, Q.V.; Manning, C.D. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Tan, H.; Bansal, M. LXMERT: Learning Cross-Modality Encoder Representations from Transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5100–5111. [Google Scholar] [CrossRef]
- Kurita, K.; Vyas, N.; Pareek, A.; Black, A.W.; Tsvetkov, Y. Measuring Bias in Contextualized Word Representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, Florence, Italy, 2 August 2019; pp. 166–172. [Google Scholar] [CrossRef]
- Zhao, J.; Wang, T.; Yatskar, M.; Ordonez, V.; Chang, K.W. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, LA, USA, 1–6 June 2018; pp. 15–20. [Google Scholar] [CrossRef]
- Blodgett, S.L.; Barocas, S.; Daumé III, H.; Wallach, H. Language (Technology) is Power: A Critical Survey of “Bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 5454–5476. [Google Scholar] [CrossRef]
- Bender, E.M.; Gebru, T.; McMillan-Major, A.; Shmitchell, S. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, New York, NY, USA, 3–10 March 2021; FAccT ’21. pp. 610–623. [Google Scholar] [CrossRef]
- Sun, T.; Gaut, A.; Tang, S.; Huang, Y.; ElSherief, M.; Zhao, J.; Mirza, D.; Belding, E.; Chang, K.W.; Wang, W.Y. Mitigating Gender Bias in Natural Language Processing: Literature Review. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1630–1640. [Google Scholar] [CrossRef]
- Bolukbasi, T.; Chang, K.W.; Zou, J.; Saligrama, V.; Kalai, A. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Proceedings of the NeurIPS, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Zhao, J.; Zhou, Y.; Li, Z.; Wang, W.; Chang, K.W. Learning Gender-Neutral Word Embeddings. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4847–4853. [Google Scholar] [CrossRef]
- Liang, S.; Dufter, P.; Schütze, H. Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 5082–5093. [Google Scholar] [CrossRef]
- Webster, K.; Wang, X.; Tenney, I.; Beutel, A.; Pitler, E.; Pavlick, E.; Chen, J.; Petrov, S. Measuring and Reducing Gendered Correlations in Pre-trained Models. arXiv 2020, arXiv:2010.06032. [Google Scholar]
- Bommasani, R.; Davis, K.; Cardie, C. Interpreting Pretrained Contextualized Representations via Reductions to Static Embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 4758–4781. [Google Scholar] [CrossRef]
- Nangia, N.; Vania, C.; Bhalerao, R.; Bowman, S.R. CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 1953–1967. [Google Scholar] [CrossRef]
- Nadeem, M.; Bethke, A.; Reddy, S. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 5356–5371. [Google Scholar] [CrossRef]
- Blodgett, S.L.; Lopez, G.; Olteanu, A.; Sim, R.; Wallach, H. Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 1004–1015. [Google Scholar] [CrossRef]
- Ahn, J.; Oh, A. Mitigating Language-Dependent Ethnic Bias in BERT. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 533–549. [Google Scholar] [CrossRef]
- Durmus, E.; Ladhak, F.; Hashimoto, T. Spurious Correlations in Reference-Free Evaluation of Text Generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 1443–1454. [Google Scholar] [CrossRef]
- Kaneko, M.; Bollegala, D. Unmasking the Mask—Evaluating Social Biases in Masked Language Models. In Proceedings of the 36st AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; pp. 11954–11962. [Google Scholar]
- Kaneko, M.; Imankulova, A.; Bollegala, D.; Okazaki, N. Gender Bias in Masked Language Models for Multiple Languages. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, DC, USA, 10–15 July 2022; pp. 2740–2750. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Online, 16–20 November 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020. [Google Scholar]
- Dwiastuti, M. English-Indonesian Neural Machine Translation for Spoken Language Domains. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, Florence, Italy, 28 July–2 August 2019; pp. 309–314. [Google Scholar] [CrossRef]
- Al-Haj, H.; Lavie, A. The Impact of Arabic Morphological Segmentation on Broad-coverage English-to-Arabic Statistical Machine Translation. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers, Denver, CO, USA, 31 October–4 November 2010. [Google Scholar]
- Rozovskaya, A.; Roth, D. Grammar Error Correction in Morphologically Rich Languages: The Case of Russian. Trans. Assoc. Comput. Linguist. 2019, 7, 1–17. [Google Scholar] [CrossRef]
- Chan, B.; Schweter, S.; Möller, T. German’s Next Language Model. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 6788–6796. [Google Scholar] [CrossRef]
- Cañete, J.; Chaperon, G.; Fuentes, R.; Ho, J.H.; Kang, H.; Pérez, J. Spanish Pre-Trained BERT Model and Evaluation Data. In Proceedings of the PML4DC at ICLR 2020, Virtual, 25–30 April 2020. [Google Scholar]
- Souza, F.; Nogueira, R.; Lotufo, R. BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In Proceedings of the Intelligent Systems: 9th Brazilian Conference, BRACIS 2020, Rio Grande, Brazil, 20–23 October 2020; pp. 403–417. [Google Scholar]
- Cui, Y.; Che, W.; Liu, T.; Qin, B.; Wang, S.; Hu, G. Revisiting Pre-Trained Models for Chinese Natural Language Processing. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; pp. 657–668. [Google Scholar] [CrossRef]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; pp. 38–45. [Google Scholar] [CrossRef]
Language | Coverage (%) | ||
---|---|---|---|
German | 1226 | 1124 | 91.7 |
Spanish | 1380 | 1125 | 81.5 |
Portuguese | 1206 | 928 | 76.9 |
Chinese | 1325 | 997 | 75.2 |
Indonesian | 671 | 312 | 46.5 |
Russian | 1289 | 583 | 45.2 |
Japanese | 1288 | 466 | 36.6 |
Arabic | 1327 | 252 | 19.0 |
Language | Kanekoorg | Kanekoall | LSG | MSG | Total |
---|---|---|---|---|---|
English | — | 39,040 | 25,993 | 28,112 | 34,970 |
Chinese | 6800 | 36,270 | 22,196 | 22,616 | 30,547 |
German | 4700 | 26,639 | 32,436 | 29,667 | 33,154 |
Portuguese | 5700 | 29,975 | 24,608 | 31,670 | 36,072 |
Spanish | 7100 | 37,808 | 76,972 | 96,995 | 114,168 |
Language | Both | One | None |
---|---|---|---|
English | 63.8% | 16.6% | 19.6% |
Chinese | 59.1% | 14.9% | 26.0% |
Spanish | 51.9% | 33.1% | 15.0% |
Portuguese | 43.0% | 44.8% | 12.2% |
German | 30.7% | 58.8% | 10.5% |
Language | Kanekoorg | Kanekoall | LSG | MSG | DBM | Male/Female (Ratio) |
---|---|---|---|---|---|---|
English | — | 52.07 (±1.34) | 50.39 (±0.28) | 45.49 | 75.18 | 62.83/37.17 (1.69:1) |
Chinese | 52.86 | 46.67 (±0.55) | 46.42 (±0.68) | 53.15 | 89.62 | 63.67/36.33 (1.75:1) |
German | 54.69 | 45.78 (±1.72) | 52.31 (±0.64) | 55.43 | 44.72 | 48.92/51.08 (0.96:1) |
Portuguese | 53.07 | 46.70 (±0.81) | 51.77 (±0.44) | 61.04 | 73.36 | 65.89/34.11 (1.93:1) |
Spanish | 51.44 | 48.52 (±1.04) | 41.68 (±0.76) | 50.74 | 72.34 | 66.29/33.71 (1.97:1) |
Lang | MBE | LSG | MSG |
---|---|---|---|
English | 31.98 | 25.67 | 19.60 |
Chinese | 32.13 | 27.34 | 25.96 |
German | 35.35 | 2.17 | 10.51 |
Portuguese | 30.90 | 31.78 | 12.20 |
Spanish | 31.99 | 32.58 | 15.04 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, J.; Kim, S.U.; Choi, J.; Choi, J.D. What Is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models. Information 2024, 15, 549. https://doi.org/10.3390/info15090549
Yu J, Kim SU, Choi J, Choi JD. What Is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models. Information. 2024; 15(9):549. https://doi.org/10.3390/info15090549
Chicago/Turabian StyleYu, Jeongrok, Seong Ug Kim, Jacob Choi, and Jinho D. Choi. 2024. "What Is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models" Information 15, no. 9: 549. https://doi.org/10.3390/info15090549
APA StyleYu, J., Kim, S. U., Choi, J., & Choi, J. D. (2024). What Is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models. Information, 15(9), 549. https://doi.org/10.3390/info15090549