Hidden Variable Models in Text Classification and Sentiment Analysis
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper presents an investigation developed from a research developed by A. S. Bakhtiari at Concordia University in 2014. It is an interesting work, although the way it is presented is somewhat confusing. In particular, I would highlight the need to modify section 2. Related work and 3. Proposed model.
It would be interesting for the authors to check that all abbreviations used are introduced or referenced. For example, LSTM on line 99 and others.... On the other hand, there is a confusing narrative in which the results and deductions of the annexes are mixed without adequate justification. I believe that one of the following two options would be more appropriate: either the deductions from the annexes are included in the corresponding sections or the narrative of the results obtained is simply presented in the latter and a more detailed explanation is given in the annexes.
In the current form, jumps must be made in the text to obtain a coherent narrative. For example, although they are used in section 2, it is not until the introduction of subsection 3.1 that the variables used in the research such as w, m, etc., are presented. These problems are repeated almost identically in sections 3.1 and 3.2.
In the presentation of the experimental results, the concept of success rate applied to the comparison of results must be defined. On the other hand, Table 1 presents the topics considered. It is strange, and the selection of these topics should be discussed, since 'he', 'go' or 'one' do not seem to be topics of sufficient importance.
For these reasons, a reworking of the article is recommended in order to recommend its publication.
Author Response
Hello,
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript presents two novel methods (GDMPCA and BLMPCA – proposed by the authors) for both text classification and sentiment analysis, that seem to improve the existing MPCA model employed for multi-topic modeling and text classification.
I appreciate the clear structure of the manuscript, following the outline of a proper scientific paper. In the introductory part, the authors put forth the hypotheses, objectives and what they set out to achieve; the related work section presents currently established methods used within the topic of the manuscript, citing and discussing about a sufficient number of relevant papers in that field; in the proposed model section the authors present their own two models, highlighting their advantages and limitations, while discussing about the related mathematical apparatus; in the experimental results the authors compare their proposed models with the well-established MPCA model and thus present the GDMPCA and BLMPCA model’s performances, both for text classification and also sentiment analysis; there is also a discussion section that comes as an extension of the last section, the conclusions, which are adequately drawn.
I have no concerns with this manuscript and I appreciate the high scientific value of it, supported also by the clear mathematical apparatus.
However, there are a few minor oversights that could be addressed to further improve the quality of the manuscript:
- Line 483: the first letter in the wor “we” should be upper case;
- Table A1 from Appendix 1 should be rotated 90° counterclockwise and the equations rearranged such that it the table’s width should fit a page orientated as portrait; I recognize that there is a lot of unused space between the columns and the table can be easily oriented horizontally;
Finally, I congratulate the authors for their contributions.
Author Response
Hello,
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsGeneral ideas
The tables are well-organized, presenting key metrics such as perplexity and time complexity for various configurations across different datasets.
The results include an analysis of time complexity, providing insight into the computational efficiency of the models. This is important for understanding the practical applicability of the models to large datasets.
The paper provides extensive empirical evidence showing that GDMPCA and BLMPCA achieve lower perplexity scores and higher classification accuracy across multiple datasets compared to the MPCA model. These metrics directly support the claim of superior performance.
Suggestions
While the results are clearly presented, the paper could benefit from a more detailed discussion of the implications of these results. For instance, an analysis of why certain models perform better on specific datasets or tasks would add value. The methodology section is well-detailed but lacks specificity in some areas, such as parameter selection and model optimization processes. Additional details would improve reproducibility.
The paper primarily compares the proposed models with the MPCA model. Including comparisons with other state-of-the-art models in text classification and sentiment analysis would provide a more comprehensive view of the models' performance relative to the broader research field. The comparison with baseline models is adequate, yet incorporating more recent or state-of-the-art models as benchmarks could offer a more comprehensive evaluation of the proposed models' effectiveness.
Benchmarking Against Recent Models. Including a comparison with recent deep learning-based models like BERT and GPT-3, or even other statistical models that have shown promising results in text classification and sentiment analysis. This would help to contextualize the performance of GDMPCA and BLMPCA within the current state of the art.
Performance Metrics. Expanding the evaluation metrics beyond classification accuracy to include measures like F1-score, precision, recall, and AUC (Area Under the ROC Curve). These metrics provide a more nuanced view of model performance, especially in datasets with imbalanced classes.
Line 120-130. Clarification on the choice of hyperparameters for the GDMPCA and BLMPCA models would be beneficial.
Tables 1 and 2. More detailed captions explaining the significance of the results presented and how they support the models' advantages would enhance understanding.
Figure 3. A discussion on the implications of the success rates shown and their impact on the practical application of the proposed models in real-world scenarios is recommended.
Section 4.2 (Topic Modeling for Medical Text). Expanding this section to address how the models handle domain-specific terminology and abbreviations common in medical texts could highlight the models' adaptability and utility.
The introduction could be enhanced by directly referencing prior works that introduced or significantly advanced the understanding of Dirichlet, Generalized Dirichlet, and Beta-Liouville distributions in the context of text analysis.
While the proposed models offer greater flexibility, they also introduce additional parameters and complexity. It would be important for the research to assess the impact of this complexity on computational efficiency and scalability, especially for large datasets.
To fully validate the superiority of the proposed models, a comprehensive comparison with existing state-of-the-art models in text classification and sentiment analysis is essential. Consider aspects like interpretability and computational efficiency.
The advanced statistical terminology may pose challenges for readers unfamiliar with these areas. Including a more intuitive explanation could enhance accessibility for a broader audience.
A more detailed discussion on potential limitations and directions for future research could enhance the reader's understanding of the models' applicability and potential areas for improvement.
Author Response
Hello,
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have considered the suggestions made in the review and have modified the work presented in a way that has been satisfactorily met, so I can recommend the publication of the work.
Author Response
Hi,
Thank you for your positive feedback and recommendation for the publication of our manuscript.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors improved the manuscript.
Author Response
Hi,
Thank you for your positive feedback and recommendation for the publication of our manuscript.