Word Sense Disambiguation Using Clustered Sense Labels
Round 1
Reviewer 1 Report
The work reports a proposal for compressing the sense vocabulary
without using a thesaurus, performing the tests on both English and
Korean corpora. The authors argue for a performance far superior to
that of the uncompressed sense model and comparable to that of the
thesaurus-based model.
INTRODUCTION. The authors clearly and briefly explain the peculiarities and
difficulties of the problem, as well as the interest of the
proposal. The Section also serves as a brief introduction to the
state-of-the-art, contextualizing the approach and indicating the
weaknesses to be resolved in relation to previous proposals. The
comparative focus is on SVC (sense vocabulary compression) methods and
the contributions are clearly explained.
SENSE DEFINITION CLUSTERING. The authors take the concept of Sense
Definition Vector (SDV) as primary source data for their proposal,
introducing later a similarity measure (in fact, the normalized
Euclidean distance) to cluster SDVs once a value is above a threshold
(2) and the new group does not include more than one sense. The
authors also provide time complexity and practical considerations. The
exposition is clear and understandable, adequately illustrated with
graphics.
DEEP-LEARNING MODEL FOR WORD SENSE DISAMBIGUATION. Taking BERT as
basis, the work introduce a sequence labeling modeling by adding a
transformer encoder above, in order to obtain the overall sentence
information.
EXPERIMENT AND RESULT. The authors first describe the experimental
setting, including the training and testing corpora chosen for both
English and Korean, the BERT hyper-parameters and base models, and the
dictionaries and thesauri used in each case. Later they retake the
evaluation methods used in Semeval, focusing on the effect of both corpus
size and threshold selection.
DISCUSSION. The discussion should be more elaborate. The impression is
that the justifications for the different behaviors observed in
English and Korean are superficial. In particular, it is
surprising that no reference is made to their different linguistic
nature, focusing the discussion on purely quantitative aspects
(baseline performance, percentage of untrained vocabulary).
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
The paper entitled “Word Sense Disambiguation using Clustered Sense Labels” proposes a model for compressing the sense vocabulary without using a thesaurus. The method converts sense definitions in a dictionary into sentence vectors and hierarchically clustered those vectors into the compressed senses while being aware of homographs. The experiment was conducted on the English Senseval and Semeval datasets, and the Korean Sejong sense tagged corpus to compare the performance of the proposed compressed sense model with uncompressed and thesaurus-based models.
In general, the topic invested in this paper is interesting and suitable for the scope of the AS journal. The paper is well written, and the results seem to be reasonable. The Introduction is thorough and covers several latest works in this field. I like the way that authors tell the story in the Introduction, starting from the definition of word sense disambiguation, the importance of machine learning approaches as well as classifying the methods for this field, and finally lead to the needs of a model that compresses sense vocabularies without using a thesaurus in the paper.
In support of this paper, the authors should revise the paper to further improve its quality before I vote for an acceptance. My comments are as follows
- In the Introduction, draw a figure of the workflow to illustrate the main idea invented in this paper. I suggest a thorough workflow from the input to the output of the model.
- In section 2, insert a table of notations, abbreviations used in the paper.
- In section 2, I wonder why the authors need to use several clustering algorithms to sense clustering, why simply use k-means for the whole clustering phase? HAC gives high interpretability in clustering results but is not well scalable for large-scale datasets (O(n3)), whereas k-means can deal with large-scale datasets and produces good clustering results if the numbers of clusters can be determined well. Thus, I suggest the authors explain in more detail the mechanisms of using different methods, why using only k-means is not enough to reach the target.
- In section 4.2, I suggest the authors show the std (+-) of the F1 scores.
- In section 5, although the finding in this paper is supported by the results, the authors should discuss possible methods for this dataset. I still think there are several ways to improve the sense clustering performance, at least in terms of computation. First, there are several works in the literature that use HAC in a combination with metrics such as the Silhouette coefficient and Calinski-Harabasz to estimate the optimal number of clusters. Therefore, we can reduce the use of affinity propagation that may increase the complexity of the whole model. Here are several good examples that authors refer into the discussion [https://doi.org/10.3390/app112311122] and [https://doi.org/10.1007/978-981-15-1209-4_1].
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
The paper proposes a method for compressing the sense vocabulary that is NOT based on a thesaurus hierarchy, to be used in WSD. The method would be valuable if it would yield better results than the existing ones that DO rely on such thesaurus hierarchies. However, this is not the case here. The authors end the paper by telling us clearly: "Our proposed method will be useful for finding the optimal sense vocabulary without using a manually constructed thesaurus". But such thesaurus exist nowadays for most languages, so why not use them, especially if they provide better results for the disambiguation task? What is the point of using only sense definitions when semantic networks and knowledge bases of type WordNet, with the corresponding sense hierarchies exist for most languages? (WordNets for over 50 languages are officially acknowledged by the Global WordNet Association). An essential point that was made in the WSD literature, since the Adapted/Extended Lesk algorithm (Banerjee and Pedersen 2002,2003) is that of using a semantic network (such as WordNet) rather than a traditional dictionary, in order to exploit the knowledge brought in by the great number of semantic relations that such a network provides. Should we really go back to what we had before this and, if so, why, more precisely? An enormous amount of research in WSD, that relies precisely on WN semantic relations, has been meanwhile conducted. Even recent publications, in very serious venues, follow this approach. See, for instance:
Wang, Y., Wang, M., Fujita, H., Word Sense Disambiguation: A comprehensive knowledge exploitation framework. Knowledge-Based Systems, 190 (2020).
To summarize, the exploitation of WordNet relations remains a central issue, that the success of WSD heavily relies on. Giving up usage of such relations would make sense only if these networks would not exist or if the obtained disambiguation results would be of better quality, which is not the case here.
Aside from this, the paper is well written and the experiment is well conducted and described. The Introduction should be extended, with more references being included. For instance, the authors refer to a WSD overview that is represented by Reference [1] from 2009. More recent overviews exist, 2014, 2018, to name just a few:
Vidhu Bhala R. V., Abirami S., Trends in word sense disambiguation. Artificial Intelligence Review, 2014; 42(2), pp. 159-171.
Popov A., Neural network models for word sense disambiguation: an overview. Cybern. Inf. Tech., 2018; 18(1), pp. 139-151.
As to the year 2009, mentioned in the text, there is a famous review that very same year that should also be mentioned (should we wish to go that far back in time):
Navigli R., Word sense disambiguation: a survey. ACM Comput. Surv., 2009; 41(2):10:1-10:69.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
I have checked this revision. The authors have improved the quality of their paper following my comments. In section 2, equation 1, I suppose \sum_{i=1}^{N} instead of \sum_{i=1}^{n}. Otherwise, please put the description for n.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf