Can Triplet Loss Be Used for Multi-Label Few-Shot Classification? A Case Study
Round 1
Reviewer 1 Report
The authors present results from experiments testing triplet-trained Siamese networks for multi-label classification in a dataset containing Hungarian legal decisions of administrative agencies in tax matters belonging to a five major legal content providers. The results show that those models can learn with a few examples, but can be confused by overlapping labels. Although this topic is appearing in blog posts, research publications are few. Overall the paper is sound and especially detailed in the Methods and Results/Discussion. The introduction falls a bit short and needs some streamlining.
Abstract
Consider using past tense instead of present tense
1-4: Sentence too long and can be improved language-wise
7-9: Rephrase. ", and the overlap between labels affects the results negatively" does not match grammatically the first part of the sentence
Introduction
14: Change along the lines of: often been inspired by nature to find solutions for solving complex problems.
18: Remove 'similarly'
24: Please rephrase "who cannot provide higher added value in the meantime".
26: Remove 'In this case' and just start with However,...
27: Exchange regularities with properties
29-33: Although I agree that a Platypus is an amazing animal, I wonder if everyone knows this animal by that name. Again, that sentence is more complex than needed.
If you want your example to match lines 32-36, I suggest you set it up a bit differently:
My suggestion something along the lines of:
In humans, seeing a few examples is usually enough to understand a concept and to apply it in new situations. Even children quickly learn that a hairy, four-legged, barking animal is most likely a dog, although they had only seen the few dogs in the neighborhood. In addition, they are also able to tell that a cat is more similar to a dog than it is to a human, even if they had never seen a cat before.
37: Introduce acronym LLM as you keep referring to them below
40: Remove 'in good quality'. "zero-shot [16], and even multi-label few-shot classification [6]."
41-43: Avoid "easily available", what does that mean. Desktop and laptop? Then say so. For example: 'LLMs do require computational resources that exceed blabla....'
Cloud platforms are easily available these days for almost everyone. The privacy issue is another topic.
47: "Machine learning models can only really spread..." - ML models are not viruses ;-)! Please rephrase in a more technical way.
49: "encounter machine learning-based solutions when using specific legal software, such as legal databases." Why is a legal database a machine-learning based solution? The reader does not know the details. A database is just a database. There can be a software on top that exploits the database using ML or maybe there are even software that offer the full package, but is not fully clear and as is sounds incomplete. Please elaborate a bit more this paragraph.
50: "problems;", replace semi-colon with colon
65: Again the sentence can be improved along the lines: "In this study, we present how and with which limitations a triplet-trained few-shot classifier model can be extended to perform multi-label categorization on short legal documents." I'm also not a native speaker, but I came across several sentences that feel incorrect grammatically or at least one feels that a native speaker would not phrase it that way. I suggest you have this MS edited by a native speaker. In addition, avoid one-sentence paragraphs as it is bad style.
68: "Two experiments...". I suggest you number: 1) Performing multi-label classification using a small labeled dataset and 2) testing how different Siamese solutions work to classify a new category unavailable during training in a binary classification setting.
Again, I suggest using past tense for what you did throughout the MS.
73-76: Is this journal style? This to me reads very unusual and I suggest removing it.
Relevant works
Content seems good to me. Remember: No one-sentence paragraphs
87: Replace 'own' with "original"
134: Consider putting '' to "'The Avg. coverage by BERT'" or in italic
146: 5 to five
3.3: I like this visual
Figure 2 seems to be in the wrong place. Should be in 4.2, but I guess this will change anyways during type-setting
104: Reference/URL to scikit-learn
230: Any rational to set it to 0.2?
234: Missing comma before 'however'
245: Change: has to have
246-249: Nice!
253: while the ones with high
Throughout the methods section you again mix tenses. Please check across MS.
The methods section is very complete and understandable. Nevertheless, some language-edit could help to further improve readability.
Results
315-317: I like that you considered which bias the users prefer.
Remember: No one-sentence paragraphs
6.3: Instead of naming it traditional methods, name those methods directly by their name so the reader does not have to jump up in the MS or at least quickly remind what models you refer to.
I have nothing to do with this publication but came across it:
https://dl.acm.org/doi/pdf/10.1145/3558100.3563843
The MS could benefit from language-editing.
Author Response
Dear Editors and Reviewer,
PLease find the modifications detailed in the attached document.
Author Response File: Author Response.pdf
Reviewer 2 Report
In this paper, a few-shot approach is conducted for legal decision models for the Hungarian language. Siamese networks are used to solve the task, which are excellent for their ability to extract co-occurrences in few-shot scenarios. The models are compared with classical ML methods and they outperform (not a great discovery).
Nevertheless, the work is well written and understandable. it is a quick read and one understands what the aim is. Image 2 is much appreciated as are the plots.
However, I would advise the authors to consider the role of prior knowledge in PLMs such as BERT. This seems to be very important to achieve results even in zero-shot scenarios. In fact, I would recommend considering the following work: PreCog: Exploring the Relation between Memorisation and Performance in Pre-trained Language Models. This work could be helpful in understanding the domain contexts already known to pre-trained models.
Author Response
Dear Editors and Reviewer,
PLease find the modifications detailed in the attached document.
Author Response File: Author Response.pdf