Next Article in Journal / Special Issue
The Role of Productization in End-To-End Traceability
Previous Article in Journal
Modeling Critical Success Factors for Industrial Symbiosis
Previous Article in Special Issue
Enhanced Skin Lesion Segmentation and Classification Through Ensemble Models
 
 
Article
Peer-Review Record

Examining Sentiment Analysis for Low-Resource Languages with Data Augmentation Techniques

Eng 2024, 5(4), 2920-2942; https://doi.org/10.3390/eng5040152
by Gaurish Thakkar *, Nives Mikelić Preradović * and Marko Tadić
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Eng 2024, 5(4), 2920-2942; https://doi.org/10.3390/eng5040152
Submission received: 13 September 2024 / Revised: 1 November 2024 / Accepted: 4 November 2024 / Published: 7 November 2024
(This article belongs to the Special Issue Feature Papers in Eng 2024)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

(1) The abstract is too short. The authors did not clearly state their work and originality in the abstract.

(2) In Table 1, why does Croatian not have the Val?

(3) The authors should plot the structures / architectures of the models.

(4) The authors should compare their method with SOTA methods. In addition, please provide and label the reference indices of the compared methods in the figures and tables, and then the readers can judge whether the compared methods are SOTA.

(5) In Section 5, many references are cited in the sub-sections, it is difficult to distinguish the authors works and existing works.

Comments on the Quality of English Language

None.

Author Response

Comments 1: The abstract is too short. The authors did not clearly state their work and originality in the abstract. 

- Response 1 Thank you for pointing this out. We agree with the comment. Therefore, the abstract has been revised and can be found on line 17. 

Comments 2: In Table 1, why does Croatian not have the Val? 

 Response 2: The original dataset released by the authors did not have validation set. We used 10% of the train set as validation set to perform the early stopping. 

Comments 3: The authors should plot the structures / architectures of the models. 

Response 3: Thank you for this comment. The models are not original contributions of the study and have been taken from the existing repositories. Thus, they can be referred to by the readers in the original papers. 

Comments 4: The authors should compare their method with SOTA methods. In addition, please provide and label the reference indices of the compared methods in the figures and tables, and then the readers can judge whether the compared methods are SOTA. 

Response 4: Thank you for pointing this out. We agree with the comment. We have introduced a label marking on the ones which are ours and not. We have prefixed the table labels with respective text to identify our work. Figures 1-4 have been updated with new scores and labels. We have performed experiments with gemma model to compare as the state of the art.  

Comments 5: In Section 5, many references are cited in the sub-sections, it is difficult to distinguish the authors’ works and existing works. 

Response 5: Thank you for pointing this out. We agree with the comment. We have citated all the existing works and added keyword [ours] to the title to distinguish the authors’ works and existing works. Table 6 and 7. 

Reviewer 2 Report

Comments and Suggestions for Authors

Thank you for inviting me to review the manuscript. The topic is sentiment analysis for different languages. I have checked the iThenticate report and found that the current manuscript overlaps heavily with some articles that the authors may publish. The author will need to review and revise the overlap before it can be considered for publication.

Other suggestions for the manuscript are listed below:

Title

The author may remove the word "examine" from the title, as examining data may be more appropriate.

Abstract

Clearer research objectives and questions can be added to the abstract.

Provide information about the data and selection criteria.

Some general findings can be included.

More details on practical and theoretical implications can be added.

Introduction

Define "neural network", "hyperparameters" and "training set-learned parameters" and explain them to the readers.

Authors can add references for lines 27 to 34.

Explain the reason for "making systems more resistant to adversarial attacks" on line 46.

The sentence "The reported methods for languages with rich linguistic resources are based on linguistic resources" on line 54 is confusing. Please revise and explain.

Please define and explain "Freebase" on line 55.

On line 62, a research objective can be added before talking about the method of analysis "This article compares...".

Please give a full form for "DA strategies" for the first time used by the author on line 63.

The choice of data (e.g. South Slavic languages) is interesting and innovative.

The phrase "with limited resources" can be rephrased/revised on line 71.

Lines 76-82 are some general findings, the authors might consider moving this part to the findings/discussion section.

Research Question

Authors may revise the structure of this section.

This study has the main research question: "Can data augmentation be effectively used for sentiment analysis in low-resource languages?" on lines 86 and 87, with 3-4 more specific questions. The authors can improve the presentation of these research questions to make them more reader-friendly.

In addition, the authors could include a better orientation paragraph instead of just saying "empirically, we have the following question for our proposed study" on line 85.

Literature Review

Authors may wish to revise the title of Section 3 "Related Work" to "Literature Review" and add an orientation paragraph between Section 3 and 3.1.

There are many literature reviews in this section. I suggest that authors use a table to organise these articles either thematically or chronologically. The table has several columns with authors, year, purpose, method, sample size and key findings. You can include your current study in the last row of the table so that the readers have a clear overview of how your study was systematically derived from the previous study.

More shortcomings of previous studies can be pointed out and a clearer research niche can be identified in paragraph lines 193-195.

In addition, please define and explain "morphology" and "inflexion systems" and how they relate to your present study.

Methodology

Please indicate the research approach, e.g. qualitative/quantitative/mixed approach, and the rationale before discussing the data.

Section 4.1 "Croatian Re-annotation" is a bit long, would the authors revise it to be more concise?

An orientation paragraph needs to be added before 4.2 and the content. In addition, the authors could put the content of these bullet points in a table.

After line 256, at the bottom of the page, the authors need to add more information about the footnote http://www.csfd.sz.

Authors may consider removing the subheading of "5. Methodology" and merging it with the previous section.

In this section, there should be more use of the past tense for the steps taken to analyse the data.

A flow chart can be used to present the steps from data generation and augmentation to morphological base, etc.

Lines 299 and 301 should avoid short forms, e.g. "hasn't looked", "task's".

The expansion and expansion-combination parts are interesting.

On lines 321 to 322, please explain more the reasons for "Theoretically, this assumption may hold for extremely polar classes, such as positive and negative, but it may fail for classes that are mixed or neutral.”

Can you add a reference or support for lines 351-353 "The assumption that a word's synonym will not affect .... enhancement techniques".

Please revise the statement "This method is greedy in nature" on line 378 to be less emotional for scientific writing. 

Define "cosine distance" on line 383.

The experiments in part 6 are important, but it seems quite repetitive with the previous sections. I suggest the authors rewrite part 5 and part 6 and try to combine similar information to make it more concise and keep the reader interested.

Findings and Discussion

An overview of the general findings of this study can be added at the beginning of the Results and Discussion section.

From lines 498 to 517, the authors can add implications of how these findings can be used. This will add value to the findings. In addition, how are these findings similar to or different from the studies in your literature review section? Also state the underlying reasons.

Section 7.1 is Error Analysis. Have the authors already reviewed "error analysis" in the literature review section? If not, a more comprehensive literature review on error analysis is suggested.

On line 523, "In these instances" refers to which instances? Do you mean the examples below from line 529 onwards?

On line 527 "The author marked the review as positive, but the model categorized it as negative". More explanation and analysis can be added.

Authors can provide more references for their results section, e.g. on pages 15-16. Links to other relevant studies can make the current data and discussion more vivid.

Authors may remove the subheadings from Section 7.2, 7.2.1-7.2.3. The content is the summary of the results and the discussion section. They may be presented in paragraph form.

Conclusion

 

The conclusion is too short. The authors could include more contributions, limitations and also theoretical and practical implications of the study. 

Comments on the Quality of English Language

Language

 

The manuscript needs minor editing, especially for the tenses used. However, the flow and meaning of the ideas are fluid and easy to understand.

Author Response

Comments 1: Title The author may remove the word "examine" from the title, as examining data may be more appropriate. 

Response 1: The title has been updated 

Comments 2: Abstract : 

Clearer research objectives and questions can be added to the abstract. 

Provide information about the data and selection criteria. 

Some general findings can be included. 

More details on practical and theoretical implications can be added. 

  • Response 2: The Abstract has been updated. 

Introduction 

Comments 3: Define "neural network", "hyperparameters" and "training set-learned parameters" and explain them to the readers. 

  • Response 3: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. 

Comments 4: Authors can add references for lines 27 to 34. 

  • Response 4: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. 

Comments 5: Explain the reason for "making systems more resistant to adversarial attacks" on line 46. 

  • Response 5: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. 

Comments 6: The sentence "The reported methods for languages with rich linguistic resources are based on linguistic resources" on line 54 is confusing. Please revise and explain. 

  • Response 6: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. Reformulated the sentence 

Comments 7: Please define and explain "Freebase" on line 55. 

  • Response 7: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. Removed the reference of freebase as we are having more explanation in the related work section. 

Comments 8: On line 62, a research objective can be added before talking about the method of analysis "This article compares...". 

  • Response 8: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. Added research objective 

Comments 9: Please give a full form for "DA strategies" for the first time used by the author on line 63. 

  • Response 9: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. Added full form at the beginning of the paragraph 

Comments 10: The choice of data (e.g. South Slavic languages) is interesting and innovative. 

Comments 11: The phrase "with limited resources" can be rephrased/revised on line 71. 

  • Response 11: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. The sentence has been revised. 

Comments 12: Lines 76-82 are some general findings, the authors might consider moving this part to the findings/discussion section. 

  • Response 12: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. We have moved to the end of the findings and discussion section. 

Research Question 

Comments 13: Authors may revise the structure of this section. 

This study has the main research question: "Can data augmentation be effectively used for sentiment analysis in low-resource languages?" on lines 86 and 87, with 3-4 more specific questions. The authors can improve the presentation of these research questions to make them more reader-friendly. 

In addition, the authors could include a better orientation paragraph instead of just saying "empirically, we have the following question for our proposed study" on line 85. 

  • Response 13: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. We have restructured the section as per the reviewers' recommendations. 

Literature Review 

Comments 14: Authors may wish to revise the title of Section 3 "Related Work" to "Literature Review" and add an orientation paragraph between Section 3 and 3.1. 

  • Response 14: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. We have added paragraph. 

Comments 15:  There are many literature reviews in this section. I suggest that authors use a table to organise these articles either thematically or chronologically. The table has several columns with authors, year, purpose, method, sample size and key findings. You can include your current study in the last row of the table so that the readers have a clear overview of how your study was systematically derived from the previous study. 

  • Response 15: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. Added a table.  

Comments 16: More shortcomings of previous studies can be pointed out and a clearer research niche can be identified in paragraph lines 193-195. 

  • Response 16: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. We have added a paragraph to highlight the same. 

Comments 17: In addition, please define and explain "morphology" and "inflexion systems" and how they relate to your present study. 

  • Response 17: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. Added a paragraph 

Methodology 

Comments 18: Please indicate the research approach, e.g. qualitative/quantitative/mixed approach, and the rationale before discussing the data. 

  • Response 18: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. Added a paragraph 

Comments 19: Section 4.1 "Croatian Re-annotation" is a bit long, would the authors revise it to be more concise? 

  • Response 19: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. The paragraph has been shortened 

Comments 20: An orientation paragraph needs to be added before 4.2 and the content. In addition, the authors could put the content of these bullet points in a table. 

  • Response 20: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. Added the orientation paragraph 

Comments 21: After line 256, at the bottom of the page, the authors need to add more information about the footnote http://www.csfd.sz. 

  • Response 21: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. Removed the hyperlink as the original dataset paper describes the same.  

Comments 22: Authors may consider removing the subheading of "5. Methodology" and merging it with the previous section. 

  • Response 22: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. 

Comments 23: In this section, there should be more use of the past tense for the steps taken to analyse the data. 

  • Response 23: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. 

Comments 24: A flow chart can be used to present the steps from data generation and augmentation to morphological base, etc. 

  • Response 24:  Thanks for the suggestion but we believe it will repetition of information that has been already presented as mathematical equations. 

Comments 25: Lines 299 and 301 should avoid short forms, e.g. "hasn't looked", "task's". 

  • Response 25: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. 

The expansion and expansion-combination parts are interesting. 

Comments 26: On lines 321 to 322, please explain more the reasons for "Theoretically, this assumption may hold for extremely polar classes, such as positive and negative, but it may fail for classes that are mixed or neutral.” 

  • Response 26: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. 

Comments 27: Can you add a reference or support for lines 351-353 "The assumption that a word's synonym will not affect .... enhancement techniques". 

  • Response 27: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. Added reference 

Comments 28: Please revise the statement "This method is greedy in nature" on line 378 to be less emotional for scientific writing.  

  • Response 28: Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. 

Comments 29: Define "cosine distance" on line 383. 

  • Response 29 : Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. Added formula for cosine distance and reference 

Comments 30: The experiments in part 6 are important, but it seems quite repetitive with the previous sections. I suggest the authors rewrite part 5 and part 6 and try to combine similar information to make it more concise and keep the reader interested. 

  • Response 30 : Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. Merged the parts into one single part.  

Comments 31: Findings and Discussion 

An overview of the general findings of this study can be added at the beginning of the Results and Discussion section. 

  • Response 31 : Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. Moved the general findings of the study here. 

Comments 32: From lines 498 to 517, the authors can add implications of how these findings can be used. This will add value to the findings. In addition, how are these findings similar to or different from the studies in your literature review section? Also state the underlying reasons. 

  • Response 32 : Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. We have linked similar study for Norwegian.  

Comments 33: Section 7.1 is Error Analysis. Have the authors already reviewed "error analysis" in the literature review section? If not, a more comprehensive literature review on error analysis is suggested. 

  • Response 33 : Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. 

Comments 34: On line 523, "In these instances" refers to which instances? Do you mean the examples below from line 529 onwards? 

  • Response 34 : Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. 

Comments 35: On line 527 "The author marked the review as positive, but the model categorized it as negative". More explanation and analysis can be added. 

  • Response 35 : Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. 

Comments 36: Authors can provide more references for their results section, e.g. on pages 15-16. Links to other relevant studies can make the current data and discussion more vivid. 

  • Response 36 : Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. We have added the references to the related studies 

Comments 37: Authors may remove the subheadings from Section 7.2, 7.2.1-7.2.3. The content is the summary of the results and the discussion section. They may be presented in paragraph form. 

  • Response 37 : Thank you for pointing this out. We agree with the comment. We have updated the text as per the suggestion. 

Conclusion 

  

Comments 38: The conclusion is too short. The authors could include more contributions, limitations and also theoretical and practical implications of the study.  

  • Response 38: Thank you for pointing this out. We agree with the comment. We have updated the conclusion section. 

 

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

(1)Please check the manuscript carefully to remove the typos, improve the language and format.

E.g.

-The letter in each blank should be upper case. Please check the data Mintz et al. [2009].

...

(2)Some paragraphs are too long and difficult to follow, e.g. Section 1 (Page 2), Section 5 (Page 14). Please divide them into several short paragraphs to improve the readability.

(3)In Table 1, it is difficult to judge the sample size from the number of the datasets.

(4)In Table 2 the continuation of Table 1? If so, they should have the same table index.

(5)NA is in Table 3. Why?

(6)The length of this paper is too long. Some well-known knowledge and unnecessary experiments can be removed or shortened/condensed, since they can be easily found in textbooks, and are not firstly proposed in this paper.

Comments on the Quality of English Language

Good.

Author Response

Comment: (1) Please check the manuscript carefully to remove the typos, improve the language and format. 

E.g. The letter in each blank should be upper case. Please check the data Mintz et al. [2009]. 

Response 1: We have performed the language check, and we have updated the citation texts for all the references. 

... 

Comment: (2) Some paragraphs are too long and difficult to follow, e.g. Section 1 (Page 2), Section 5 (Page 14). Please divide them into several short paragraphs to improve the readability. 

Response 2: We have split the paragraphs into several short paragraphs 

Comment: (3) In Table 1, it is difficult to judge the sample size from the number of datasets. 

Response 3: We have updated the table with sample size column 

Comment: (4) In Table 2 the continuation of Table 1? If so, they should have the same table index. 

  • Response  4 We have updated the table numbering 

Comment: (5) NA is in Table 3. Why? 

  • Response  The original authors did not provide the validation datasets, and we used 10% of the train. We have updated the values of the table as per splits used in our experiments. 

Comment:  (6) The length of this paper is too long. Some well-known knowledge and unnecessary experiments can be removed or shortened/condensed, since they can be easily found in textbooks, and are not firstly proposed in this paper. 

  • Response 6: We agree the that the length of the paper is long. We have removed some sentences from the Croatian re-annotation section which were repetitive. We would like to retain the experiments as the results for Croatian and other Slavic languages have not been reported. In future these results could be cited using this article without having to rerun the experiments. 
  • If required, we can delete the definitions of neural nets other related items which were suggested by the reviewers as these definitions can be found in textbooks and are kind of prerequisite before delving into works that involve neural nets. 

 

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have addressed my concerns. I have no further comments on the revised manuscript.

All the best! 

Author Response

Thank you for your comments.

Back to TopTop