Unsupervised Word Sense Disambiguation Using Transformer’s Attention Mechanism
:1. Introduction
Our Contribution
2. Related Work
- Topic models for word sense embeddings [17];
- “insect”, “fly”, “beetle”, “bugs” and “worm” for the “insect” sense;
- “virus”, “infection”, “crisis”, “disease” and “surprise” for the “disease” sense.
3. Lexical Sample Selection
- Have at least two senses in the Romanian Explanatory Dictionary (DEX) [23], but no more than five, to ease manual gold-standard annotation for each lemma. We strived to include words whose senses are rather close to each other, as well as words with senses that are more semantically distant. The evaluation of the distance between senses was based on the linguist’s intuition that made the choices and who is also in charge with the development of the Romanian WordNet.
- Are not homonymous. Unlike polysemy, homonyms are very distant semantically, and presumably, this makes the tasks of semantic disambiguation rather easy, given that the contexts of occurrence of one homonym are totally different from those of the other homonym.
- Do not have rich collocation-driven senses. Whenever the senses described in the dictionary are exemplified with collocates, we leave the respective word aside, as collocates help to easily figure out the meaning (be it manually or automatically). The same applies to words whose senses are expression-dependent.
- Do not appear in the Romanian WordNet, as we are also interested in automatically extending the Romanian WordNet with new synsets (work outside the scope of this paper).
- Are frequent in the Reference Corpus of the Contemporary Romanian Language (CoRoLa) [24], as our unsupervised WSD algorithm needs a relatively large sample of sentences to train itself to map prototype sense examples to sense clusters. This criterion also ensures that selected lemmas have high coverage in other corpora.
pustiu, nici urmă de copii.
in the gang, it was deserted, no sign of children.
face cu respectarea statutului și specificului dogmatic și canonic al cultului
senate will be done in compliance with the statute and the dogmatic and
canonical specificity of the founding cult.
spațiul # aerian # belgian, ambele doborâte de vânătoarea germană.
the Belgian # airspace #, both shot down by German fighter planes.
copiii care au primit tratament cu Increlex.
children who received treatment with Increlex.
4. WSI and WSD Algorithms
- Cluster example sentences (i.e., perform WSI) of the target lemma in both Romanian and English such that the number of clusters in both languages, obtained independently, optimizes a cluster overlap measure. We adopt the V-Measure cluster overlap measure, the one used in SemEval-2010 Task 14 [12]. Intuitively, if translation conserves the meaning, we expect to obtain roughly the same clusters in Romanian and English, such that a cluster overlap measure would have an optimum value for a similar number of clusters in Romanian and English.
- In Romanian, train a BERT WSD model on each cluster of to estimate the probability of assigning sense of lemma to cluster . Together with the probability of cluster , , computed from the classes’ distribution on occurrences of lemma , maximize the conditional probability .
4.1. The WSI Algorithm
- A BERT contextual vector: Given lemma , search for its occurrence in the BERT-tokenized example sentence and take the BERT embedding from the last hidden state. If the occurrence of lemma is split by the BERT tokenizer, take the element-wise sum of the BERT embeddings of the sub-tokens.
- An attention contextual vector: See below for a detailed description.
- A concatenation of the two: Only concatenate the BERT contextual vector with the attention contextual vector.
- n is the index of a BERT hidden layer (BERT models have between 12 and 24 hidden layers).
- is the m-th attention head of hidden layer n (each hidden layer contains 8 to 16 attention heads).
- is the softmaxed attention column vector of the word at position j in the BERT-tokenized sentence for the head .
- returns the top k (k = 3) list of pairs of lemmas with their attention weights (l, w) that are closest to the lemma L.
- For each lemma l in Di, compute the average of weights , create a list F1 of (l, a) pairs and sort it in descending order by a. We call this method mean.
- For each lemma l in Di and weight w in Di[l], create a list F2 of (l, w) pairs and sort it in descending order by w. Eliminate all pairs (l, w) for which l is in a pair with a bigger weight. We call this method max.
- For each lemma l in Di, compute the size of its weight list , create a list F3 of (l, s) pairs and sort it in descending order by s. We call this method heads.
- From any of the lists, F1, F2, or F3, we can send the top p_vocab_topk (an integer which is a parameter of this algorithm) to vocabulary V.
- From lists F1 or F2, we can set a cutoff threshold p_vocab_cutoff on the value of the associated weight and send the lemmas appearing with a weight that is bigger or equal to p_vocab_cutoff.
- From any of the lists, F1, F2, or F3, we can construct a global (i.e., for all example sentences) lemma frequency dictionary and select the top p_vocab_freq lemmas to constitute the vocabulary V.
4.2. The WSD Algorithm
- For sentences and , for all and , we extract the BERT embeddings corresponding to the occurrence of lemma L and compute a cosine similarity between them. We sort all pairs by the cosine similarity in descending order and keep the top 10% of examples with a cosine similarity of at least 0.7 as “Belonging to sense ” and the bottom 10% of examples as “Not belonging to sense ” as a “train set” of mapping to cluster .
- We fine-tune the BERT model to classify the remaining 80% of examples in the cluster as either “Belonging to sense ” (label 1) or “Not belonging to sense ” (label 0). The classifier uses the BERT embedding of lemma L in example sentence onto which it stacks a two-neuron, fully connected softmaxed classification layer that is trained (along with the BERT parameters) on the “train set” produced in step 1.
- From the classification of the remaining 80% of examples in the cluster , we obtain example sentences that have been assigned label 0 or label 1. If we count the number of times that label 1 was assigned and divide it by (10% were already assumed to have label 1 in the train set), we obtain our estimate of .
4.3. A Qualitative Comparison with the Current State-of-the-Art Unsupervised and Knowledge-Based WSD Algorithms
- It does not rely on structured sense inventories to function (e.g., WordNet), and it can run on very large corpora. The game theoretic approach proposed in [22] disambiguates all content words in a text simultaneously, thus having to be optimized when run on a corpus with hundreds of millions of words. Furthermore, it uses sense-annotated corpora to initialize the sense distribution vectors of the players, which makes it a semi-supervised WSD approach.
- It works with any pre-trained and language-specific Transformer-based LLM, as opposed to sense embedding models (e.g., PolyLM [19]) that must be pretrained on very large corpora first.
- It uses all the attention layers of the BERT model to build a richer contextualized representation of a word, as opposed to algorithms that only use the final BERT hidden state as the contextualized representation [20,21]. In this sense, our attention-based contextual representation is more like the word substitution representation from [18].
- It may fail to work if the target lemma has a frequency that is less than 100, as there are few examples in the cluster to estimate the sense-to-cluster mapping probability. This is the reason why this algorithm is suited to large and very large corpora.
4.4. Case Study: Adjective “Aerian”
- k-means (kmn) clustering or agglomerative (agg) clustering with the “cosine” distance and the “average” linkage method.
- Clustering with BERT contextual vectors (bert), attention contextual vectors (attn) or a concatenation of both (both). The following parameters only apply when we do not use bert.
- p_vocab_method can be mean, max or heads.
- p_vocab_topk takes values from the list of 1, 2, 5 and 10.
- p_vocab_cutoff takes values from the list of 0.3, 0.5, 0.7 and 0.9.
- p_vocab_freq takes values from the list of 10, 20, 50 and 100.
- Translation errors that cause the respective sentences to be reassigned to other clusters or to form new clusters.
- Translation can be a hypernym of our lemma of interest, and thus, it encompasses different senses in the source language.
- There were 81 instances of DEX ID ‘1’, which maps to Princeton WordNet’s synset “aerial—existing or living or growing or operating in the air”.
- There were 869 instances of DEX ID ‘2’, which maps to Princeton WordNet’s synset “air—travel via aircraft”.
- There were five instances of DEX ID ‘3’, which maps to Princeton WordNet’s synset “aerial—characterized by lightness and insubstantiality: as impalpable or intangible as air”.
- There were three instances of DEX ID ‘4’, which is the same sense as the Collins Dictionary [31] “absent-minded—forgets things or does not pay attention to what they are doing, often because they are thinking about something else”.
- There were 38 instances of DEX ID ‘5’, which maps to Princeton WordNet’s synset “respiratory tract, airway—the passages through which air enters and leaves the body”.
- There were 4749 instances of DEX ID ‘2’ (this is cluster ID ‘0’; see Table 7 below for how WSD can correctly map this sense ID to this cluster ID).
- There were 83 instances with a sub-sense of DEX ID ‘2’ related to aerial battles or attacks (cluster ID ‘1’).
- There were 168 instances of DEX ID ‘1’ (cluster ID ‘2’).
- There were 4634 instances of DEX ID ‘2’.
- There were 148 instances of DEX ID ‘1’.
- There were 128 instances with a sub-sense of DEX ID ‘2’ related to air forces.
- There were 90 instances with a sub-sense of DEX ID ‘2’ related to aerial rescuing missions.
5. Results and Discussion
- Number of Romanian and English clusters determined with the V-Measure overlap method.
- Number of Romanian and English clusters in the 200-example gold standard.
- V-Measure of cluster overlapping.
- Paired F-score overlap [12].
- WSD accuracy and baseline WSD accuracy.
6. Conclusions
Author Contributions
Data Availability Statement
Conflicts of Interest
- Vaswani, A.; Jones, L.; Shazeer, N.; Parmar, N.; Gomez, A.N.; Uszkoreit, J.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Hello GPT-4o. Available online: https://openai.com/index/hello-gpt-4o/ (accessed on 10 October 2024).
- Gemini Models. Available online: https://deepmind.google/technologies/gemini/ (accessed on 10 October 2024).
- Schütze, H. Automatic word sense discrimination. Comput. Linguist. 1998, 24, 97–123. [Google Scholar]
- Song, X.; Salcianu, A.; Song, Y.; Dopson, D.; Zhou, D. Fast WordPiece Tokenization. arXiv 2020, arXiv:2012.15524. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
- Tufiș, D.; Ion, R.; Ide, N. Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets. In Proceedings of the 20th International Conference on Computational Linguistics COLING 2004, Geneva, Switzerland, 23–27 August 2004. [Google Scholar]
- Tufiș, D.; Barbu Mititelu, V. The Lexical Ontology for Romanian. In Language Production, Cognition, and the Lexicon, Series Text, Speech and Language Technology; Gala, N., Rapp, R., Bel-Enguix, N., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; Volume 48. [Google Scholar]
- Fellbaum, C. (Ed.) WordNet: An Electronic Lexical Database; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a Few Examples: A Survey on Few-shot Learning. ACM Comput. Surv. (CSUR) 2020, 53, 1–34. [Google Scholar] [CrossRef]
- Raganato, A.; Camacho-Collados, J.; Navigli, R. Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain, 3–7 April 2017. [Google Scholar]
- Manandhar, S.; Klapaftis, I.P.; Dligach, D.; Pradhan, S.S. SemEval-2010 Task 14: Word Sense Induction & Disambiguation. In Proceedings of the 5th International Workshop on Semantic Evaluation, ACL 2010, Uppsala, Sweden, 15–16 July 2010. [Google Scholar]
- Word Sense Induction. Available online: https://paperswithcode.com/task/word-sense-induction (accessed on 17 October 2024).
- Bartunov, S.; Kondrashkin, D.; Osokin, A.; Vetrov, D. Breaking Sticks and Ambiguities with Adaptive Skip-gram. PMLR 2016, 51, 130–138. [Google Scholar]
- Sun, Y.; Rao, N.; Ding, W. A Simple Approach to Learn Polysemous Word Embeddings. arXiv 2017, arXiv:1707.01793. [Google Scholar]
- Huang, E.H.; Socher, R.; Manning, C.D.; Ng, A.Y. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics Volume 1: Long Papers, Jeju, Republic of Korea, 8–14 July 2012. [Google Scholar]
- Amplayo, R.K.; Hwang, S.; Song, M. AutoSense Model for Word Sense Induction. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, HI, USA, 27 January–1 February 2019. [Google Scholar]
- Eyal, M.; Sadde, S.; Taub-Tabib, H.; Goldberg, Y. Large Scale Substitution-based Word Sense Induction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics Volume 1: Long Papers, Dublin, Ireland, 22–27 May 2022. [Google Scholar]
- Ansell, A.; Bravo-Marquez, F.; Pfahringer, B. PolyLM: Learning about Polysemy through Language Modeling. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, Online, 19–23 April 2021. [Google Scholar]
- Chawla, A.; Mulay, N.; Bishnoi, V.; Dhama, G.; Singh, A.K. A Comparative Study of Transformers on Word Sense Disambiguation. In Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science; Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N., Eds.; Springer: Cham, Switzerland, 2021; Volume 1516. [Google Scholar]
- Vandenbussche, P.-Y.; Scerri, T.; Daniel, R., Jr. Word Sense Disambiguation with Transformer Models. In Proceedings of the 6th Workshop on Semantic Deep Learning (SemDeep-6), Online, 8 January 2021. [Google Scholar]
- Tripodi, R.; Navigli, R. Game Theory Meets Embeddings: A Unified Framework for Word Sense Disambiguation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019. [Google Scholar]
- Academia Română, Institutul de Lingvistică “Iorgu Iordan—Al. Rosetti”. DEX—Dicționarul Explicativ al Limbii Române; Univers Enciclopedic: București, România, 2016. [Google Scholar]
- Tufiș, D.; Barbu Mititelu, V.; Irimia, E.; Păiș, V.; Ion, R.; Diewald, N.; Mitrofan, M.; Onofrei, M. Little Strokes Fell Great Oaks. Creating CoRoLA, The Reference Corpus of Contemporary Romanian. RRL 2019, 64, 227–240. [Google Scholar]
- Ion, R. (Ed.) Evaluating and User-Testing Rodna, A New Romanian Text Processing Pipeline; Research report; Romanian Academy: Bucharest, Romania, 2022. [Google Scholar]
- Mistral. Available online: https://ollama.com/library/mistral (accessed on 24 October 2024).
- Scikit-Learn. Available online: https://scikit-learn.org/1.5/index.html (accessed on 22 October 2024).
- Masala, M.; Ruseti, S.; Dascalu, M. RoBERT—A Romanian BERT Model. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020. [Google Scholar]
- Readerbench/RoBERT-Small. Available online: https://huggingface.co/readerbench/RoBERT-small (accessed on 22 October 2024).
- FacebookAI/Roberta-Base. Available online: https://huggingface.co/FacebookAI/roberta-base (accessed on 22 October 2024).
- Collins Dictionary. Available online: https://www.collinsdictionary.com/dictionary/english (accessed on 23 October 2024).
- Elmakias, I.; Vilenchik, D. An Oblivious Approach to Machine Translation Quality Estimation. Mathematics 2021, 9, 2090. [Google Scholar] [CrossRef]
- Moosa, I.M.; Zhang, R.; Yin, W. MT-Ranker: Reference-free machine translation evaluation by inter-system ranking. In Proceedings of the 12th International Conference on Learning Representations, ICLR 2024, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Readerbench/RoBERT-Base. Available online: https://huggingface.co/readerbench/RoBERT-base (accessed on 6 January 2025).
- Readerbench/RoBERT-Large. Available online: https://huggingface.co/readerbench/RoBERT-large (accessed on 6 January 2025).
- FacebookAI/xlm-Roberta-Large. Available online: https://huggingface.co/FacebookAI/xlm-roberta-large (accessed on 6 January 2025).
Nouns | Verbs | Adjectives | Adverbs | ||||
articol (article) | 2,671,344 | putea (can) | 2,659,565 | dat (given) | 1,810,237 | poate (maybe) | 1,197,761 |
caz (case) | 1,828,842 | publica (publish) | 2,606,670 | prevăzut (provided) | 1,623,604 | numai (only) | 561,452 |
lege (law) | 1,706,469 | prevedea (foresee) | 1,714,764 | public (public) | 1,487,763 | astfel (thus) | 474,963 |
dată (date) | 1,637,314 | face (do) | 1,668,716 | oficial (official) | 1,251,986 | doar (just) | 341,220 |
an (year) | 1,599,338 | trebui (must) | 1,199,171 | publicat (published) | 1,186,120 | bine (well) | 277,278 |
parte (part) | 1,397,337 | sta (stay) | 902,738 | național (national) | 1,070,671 | acum (now) | 224,377 |
persoană (person) | 1,362,166 | stabili (establish) | 811,111 | prezent (present) | 1,033,534 | apoi (then) | 222,453 |
stat (state) | 1,305,742 | da (give) | 780,371 | următor (next) | 901,561 | așa (so) | 208,226 |
activitate (activity) | 1,200,049 | avea (have) | 726,615 | mare (big) | 824,501 | încă (yet) | 190,663 |
serviciu (job) | 1,102,383 | modifica (modify) | 701,571 | medical (medical) | 783,207 | aici (here) | 181,573 |
Nouns | Verbs | Adjectives | Adverbs | ||||
pondere (weight) | 24,398 | abilita (authorize) | 31,372 | oficial (official) | 1,251,986 | aci (here) | 4481 |
caiet (notebook) | 22,203 | disputa (play) | 12,366 | unic (unique) | 273,681 | orbește (blindly) | 1201 |
incintă (premise) | 18,329 | dispera (despair) | 8111 | cult (cultivated) | 58,432 | zdravăn (healthy) | 1068 |
relief (terrain) | 8377 | recepționa (receive) | 6900 | aerian (aerial) | 54,152 | omenește (humanly) | 831 |
codru (forest) | 7027 | răci (cool) | 6135 | conform (consistent) | 34,861 | ||
puț (well) | 4804 | depărta (separate) | 6060 | reprezentativ (representative) | 27,291 | ||
papuc (slipper) | 2900 | gripa (grind to a halt) | 2847 | verbal (verbal) | 26,467 | ||
brumă (frost) | 2558 | înseta (long for water) | 2436 | vegetal (vegetable) | 25,295 | ||
ansă (loop) | 1536 | parveni (become rich) | 2010 | sectorial (sectorial) | 23,094 | ||
săpuneală (scolding) | 56 | înfrăți (bond) | 989 | liric (lyrical) | 17,960 |
Lemmas | Annotated | Cases A + D | Case B | Case C |
Nouns | ||||
pondere | 199 | 1 | 0 | 0 |
caiet | 200 | 0 | 0 | 0 |
incintă | 192 | 1 | 2 | 5 |
relief | 192 | 7 | 1 | 0 |
codru | 179 | 21 | 0 | 0 |
puț | 165 | 10 | 9 | 16 |
papuc | 145 | 55 | 0 | 0 |
brumă | 163 | 37 | 0 | 0 |
ansă | 170 | 30 | 0 | 0 |
săpuneală | 49 | 6 | 1 | 0 |
Verbs | ||||
abilita | 198 | 2 | 0 | 0 |
disputa | 198 | 1 | 0 | 1 |
dispera | 187 | 12 | 0 | 1 |
recepționa | 200 | 0 | 0 | 0 |
răci | 192 | 8 | 0 | 0 |
depărta | 199 | 1 | 0 | 0 |
gripa | 52 | 148 | 0 | 0 |
înseta | 198 | 2 | 0 | 0 |
parveni | 170 | 2 | 26 | 2 |
înfrăți | 200 | 0 | 0 | 0 |
Adjectives | ||||
oficial | 200 | 0 | 0 | 0 |
unic | 199 | 0 | 1 | 0 |
cult | 94 | 106 | 0 | 0 |
aerian | 996 | 3 | 0 | 1 |
conform | 198 | 2 | 0 | 0 |
reprezentativ | 196 | 0 | 4 | 0 |
verbal | 197 | 1 | 1 | 0 |
vegetal | 200 | 0 | 0 | 0 |
sectorial | 200 | 0 | 0 | 0 |
liric | 197 | 2 | 1 | 0 |
Adverbs | ||||
aci | 175 | 6 | 0 | 19 |
orbește | 45 | 155 | 0 | 0 |
zdravăn | 177 | 0 | 2 | 21 |
omenește | 194 | 6 | 0 | 0 |
Total | 6716 | 619 | 48 | 66 |
Lemmas | Translations |
Nouns | |
pondere | weight, share, proportion, weighting, percentage, significant, number, great, rate, population |
caiet | notebook, sheet, job, file, folder, task, taskbook, book, notepad, questionnaire |
incintă | premise, building, enclosure, compound, facility, area, container, fortification, room, chamber |
relief | relief, terrain, landscape, Romanian, hilly, character, mountainous, raise, hill, topography |
codru | forest, codru, codrul, codrii, wood, codrului, codri, codrilor, bread |
puț | well, pit, puț, shaft, hole, oil, number, water, wells, tank |
papuc | slipper, papuc, shoe, sandal, papuci, rubber, papucii, boot |
brumă | frost, fog, snow, mist, veil, autumn, haze, winter, frosty, cold |
ansă | ansa, jejunal, loop, anastomosis, annex, diathermic, parallel, intestinal, anus, year |
săpuneală | needle, soap, slap, face, sapooning, weekly, umbrella, shovel, razor, puddle |
Verbs | |
abilita | ability, authorize, capable, enable, able, competent, empower, qualify, law, skill |
disputa | dispute, play, match, hold, debate, contest, two, final, place |
dispera | desperate, desperately, despair, despairingly, disappointed, despairing, disappoint, sadly |
recepționa | receive, reception, accept, refer, separately, message, information, electronic, works |
răci | cool, cold, down, get, freeze, cooling, chill, temperature, refrigerate |
depărta | away, depart, leave, remove, far, distance, distant, withdraw, apart, separate |
gripa | sick, gripat, engine, gripe, grind to a halt, falter, economic, word, stir, stiff |
înseta | hungry, craving, starve, eager, insatiable, yearn, long, enchant, thirsty, famish |
parveni | reach, come, arrive, receive, succeed, parvin, become, climb, person |
înfrăți | twin, friend, connect, link, fraternize, bond, unite, city, together |
Adjectives | |
oficial | official, oficial, august, officially, article, translate, publish |
unic | unique, unic, number, publish, only, use, payment |
cult | cult, culture, faith, Christian, cultured, religious, worship, pious, church, religion |
aerian | air, aerial, aviation, airline, airspace, aerian, airway, airborne, aeriene |
conform | accordance, conform, conformity, accord, copy, line, compliance, council, conforming |
reprezentativ | representative, representation, represent, representatively, renown, representational, national, team, Romanian |
verbal | verbal, verbally, verb, reception, orally, process, procedural, note, minute, write |
vegetal | vegetable, vegetal, plant, vegetation, animal, product, origin, production, agricultural, vegetarian |
sectorial | sectorial, sectoral, sectorially, sector, development, pension, economic, strategy, operational, program |
liric | lyrical, lyric, literary, poetry, poetic, lyricism, lyrically, literature, lyricist, poet, poem |
Adverbs | |
aci | here, aci, there, one, place, only |
orbește | orbește, blindly, shamelessly |
zdravăn | healthy, zdravăn, healthily, health, sick |
omenește | human, person, humanly, humanity, humanely, humanize, humane, man, humankind, humanizing |
Clustering | Vector Type | p_vocab_method | p_vocab_topk | p_vocab_cutoff | p_vocab_freq | VM (%) | ||
kmn | bert | n/a | n/a | n/a | n/a | 4 | 5 | 7.41 |
agg | bert | n/a | n/a | n/a | n/a | 5 | 4 | 3.38 |
kmn | both | mean | n/a | n/a | 20 | 5 | 5 | 9.08 |
agg | both | mean | n/a | n/a | 10 | 5 | 4 | 3.38 |
kmn | attn | mean | n/a | 0.7 | n/a | 2 | 3 | 13.01 |
agg | attn | mean | n/a | n/a | 100 | 5 | 4 | 18.8 |
0.02% | 0.01% | 0.24% | 0.28% | |
0.4% | 0.7% | 8% | 7.47% | |
0.64% | 0.95% | 7.64% | 7.21% | |
0.44% | 0.88% | 6.28% | 6.11% |
Cluster ID ‘0’ | Cluster ID ‘1’ | Cluster ID ‘2’ | |
DEX ID ‘1’ | = 0.184 | 0.283 | 0.293 |
DEX ID ‘2’ | 0.301 | 0.193 | 0.22 |
DEX ID ‘3’ | 0.058 | 0.117 | 0.006 |
DEX ID ‘4’ | 0.231 | 0.007 | 0.087 |
DEX ID ‘5’ | 0.226 | 0.4 | 0.393 |
Cluster ID ‘0’ | |
DEX ID ‘1’ | = 0.174 |
DEX ID ‘2’ | 0.23 |
DEX ID ‘3’ | 0.127 |
DEX ID ‘4’ | 0.244 |
DEX ID ‘5’ | 0.225 |
Lemma | VM (%) | FS (%) | WSD acc. (%) | BL WSD acc. (%) | |||
Nouns | |||||||
pondere | 3 | 3 | 0.23 | 97.2 | 3 | 86 | 86 |
caiet | 2 | 2 | 0 | 97.3 | 2 | 80.5 | 80.5 |
incintă | 3 | 3 | 1.43 | 85.3 | 3 | 63.4 | 43.8 |
relief | 3 | 3 | 0.4 | 90.9 | 3 | 8.8 | 38.9 |
codru | 2 | 2 | 0.07 | 97.7 | 2 | 97.2 | 97.2 |
puț | 2 | 2 | 0.35 | 96.4 | 2 | 57.5 | 58 |
papuc | 2 | 2 | 1 | 91.9 | 2 | 99 | 99 |
brumă | 3 | 3 | 40.74 | 84.5 | 3 | 29.4 | 46 |
ansă | 4 | 4 | 2.56 | 75.8 | 4 | 65.3 | 93.5 |
săpuneală | 3 | 2 | 11.5 | 72 | 3 | 92 | 10 |
Verbs | |||||||
abilita | 2 | 2 | 0.28 | 92.4 | 2 | 99 | 99 |
disputa | 3 | 3 | 0.23 | 95.5 | 3 | 61.6 | 29.3 |
dispera | 2 | 2 | 0.15 | 97.8 | 2 | 89.3 | 89.3 |
recepționa | 2 | 2 | 0.1 | 97.5 | 2 | 33 | 33 |
răci | 3 | 3 | 0.13 | 94.1 | 3 | 71.4 | 71.4 |
depărta | 3 | 3 | 0.46 | 91.8 | 3 | 7.5 | 69.8 |
gripa | 2 | 2 | 0.87 | 92.6 | 2 | 88.5 | 11.5 |
înseta | 2 | 2 | 0.1 | 98.5 | 2 | 22.2 | 77.8 |
parveni | 2 | 2 | 32.75 | 93.4 | 4 | 27.6 | 20.9 |
înfrăți | 2 | 2 | 100 | 100 | 2 | 99 | 99 |
Adjectives | |||||||
oficial | 3 | 3 | 0 | 99.9 | 3 | 93.5 | 93.5 |
unic | 2 | 2 | 0 | 99.9 | 2 | 94.5 | 94.5 |
cult | 3 | 3 | 0.49 | 63.8 | 3 | 59.6 | 59.6 |
aerian | 3 | 4 | 8 | 89.7 | 5 | 83.9 | 0.3 |
conform | 2 | 2 | 0 | 65.7 | 2 | 0.5 | 42.9 |
reprezentativ | 2 | 2 | 0 | 99.3 | 2 | 48.5 | 53.5 |
verbal | 2 | 2 | 1.72 | 72.5 | 2 | 76.3 | 76.3 |
vegetal | 3 | 3 | 2.72 | 93.7 | 3 | 88.5 | 88.5 |
sectorial | 2 | 2 | 0 | 99.5 | 2 | 99 | 99 |
liric | 4 | 4 | 0.36 | 92.4 | 4 | 2.5 | 0.5 |
Adverbs | |||||||
aci | 2 | 2 | 0.32 | 96.4 | 2 | 70.3 | 73.7 |
orbește | 2 | 2 | 0.47 | 95.4 | 2 | 88.9 | 91.1 |
zdravăn | 2 | 2 | 0.06 | 96.4 | 2 | 52 | 48 |
omenește | 2 | 2 | 0.06 | 95.1 | 2 | 38.1 | 38.1 |
Average | n/a | n/a | n/a | n/a | n/a | 63.9 | 62.1 |
Nouns | Verbs | Adjectives | Adverbs | ||||
pondere (weight) | 92.8% | abilita (ability) | 96.3% | oficial (official) | 73.5% | aci (here) | 46.3% |
caiet (notebook) | 90.8% | disputa (dispute) | 93.4% | unic (unique) | 63.1% | orbește (blindly) | 11.1% |
incintă (premise) | 83% | dispera (desperate) | 85.8% | cult (culture) | 19.4% | zdravăn (healthy) | 43.1% |
relief (terrain) | 9.3% | recepționa (receive) | 95.2% | aerian (air) | 92.8% | omenește (human) | 72.1% |
codru (forest) | 44.8% | răci (cool) | 83.9% | conform (consistent) | 31.7% | ||
puț (well) | 83.6% | depărta (away) | 84.7% | reprezentativ (representative) | 94.8% | ||
papuc (slipper) | 59.1% | gripa (sick) | 69% | verbal (verbal) | 90.7% | ||
brumă (frost) | 40.5% | înseta (craving) | 79.5% | vegetal (vegetable) | 93% | ||
ansă (loop) | 42.5% | parveni (reach) | 78.9% | sectorial (sectorial) | 96% | ||
săpuneală (soap) | 26% | înfrăți (friend) | 80.5% | liric (lyrical) | 88.4% |
Averages | Nouns | Verbs | Adjectives | Adverbs |
Translation accuracy | 57.2% | 84.7% | 74% | 43.1% |
V-Measure | 5.8 | 13.5 | 1.3 | 0.2 |
WSD accuracy | 67.9% | 59.9% | 64.6% | 62.3% |
Baseline WSD accuracy | 65.2% | 60.1% | 60.8% | 62.7% |
Number of senses in GS | 2.7 | 2.5 | 2.8 | 2 |
Lemma/POS | RoBERT-Small | RoBERT-Base | RoBERT-Large | XLM-RoBERTa-Large | ||||
relief/n | 8.8% | 38.9% | 38.9% | 4.7% | 38.3% | 4.7% | 0% | 0% |
depărta/v | 7.5% | 69.8% | 66.8% | 24.6% | 68.8% | 24.6% | 68.8% | 0% |
conform/a | 0.5% | 42.9% | 56.1% | 42.9% | 42.4% | 56.6% | 42.9% | 0% |
aci/r | 70.3% | 73.7% | 70.3% | 73.7% | 70.3% | 25.7% | 0% | 0% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ion, R.; Păiș, V.; Mititelu, V.B.; Irimia, E.; Mitrofan, M.; Badea, V.; Tufiș, D. Unsupervised Word Sense Disambiguation Using Transformer’s Attention Mechanism. Mach. Learn. Knowl. Extr. 2025, 7, 10. https://doi.org/10.3390/make7010010
Ion R, Păiș V, Mititelu VB, Irimia E, Mitrofan M, Badea V, Tufiș D. Unsupervised Word Sense Disambiguation Using Transformer’s Attention Mechanism. Machine Learning and Knowledge Extraction. 2025; 7(1):10. https://doi.org/10.3390/make7010010
Chicago/Turabian StyleIon, Radu, Vasile Păiș, Verginica Barbu Mititelu, Elena Irimia, Maria Mitrofan, Valentin Badea, and Dan Tufiș. 2025. "Unsupervised Word Sense Disambiguation Using Transformer’s Attention Mechanism" Machine Learning and Knowledge Extraction 7, no. 1: 10. https://doi.org/10.3390/make7010010
APA StyleIon, R., Păiș, V., Mititelu, V. B., Irimia, E., Mitrofan, M., Badea, V., & Tufiș, D. (2025). Unsupervised Word Sense Disambiguation Using Transformer’s Attention Mechanism. Machine Learning and Knowledge Extraction, 7(1), 10. https://doi.org/10.3390/make7010010