Cross-Domain Text Sentiment Analysis Based on CNN_FT Method
Round 1
Reviewer 1 Report
The paper applies a set of machine-learning methods to the task of cross-domain sentiment classification. The goal is to show that convolutional neural networks outperform traditional approaches such as Naive Bayes or SVM. The experimental setup includes the training of the machine learners in one domain and their application to another domain. The training involves a step-by-step inclusion of more and more training samples from the target domain to see how much training data of the target data is needed to deliver good results.
The idea of the paper is interesting and scientifically relevant. However, the results are not convincing and the reported numbers lack detail. I suggest that these critical points are covered before the paper is considered for publication:
* It is not shown how well the machine learners perform intra-domain classification, i.e. k = 2.000 and no samples from the other domain (k is the number of target domain samples included in training). Without this, the bold statements by the authors: "achieves state-of-the-art performance" and "can effectively solve the transfer learning problem" is invalid, because we do not know how well a "pure" model would have worked.
* The comparison with traditional machine-learning methods only uses a k = 500, even though k=0, k=200, k=1.000, and k=2.000 would also be interesting and necessary for comparison.
* The numbers for complete cross-domain transferring (k=0) are not convincing. Only after adding training samples from the target domain they get better.
* The results for the traditional machine-learners do not really seem to be much worse than for the CNN.
* When evaluating on the English Amazon corpus, the authors suddenly use a k=50. This target domain sample size was not used before and it is not made clear why this sudden change is necessary. The same setup with k=0, k=200, ..., would be necessary.
* The description of the Chinese and English corpora makes it look like they are widely used in the literature. However, the reference for the Amazon corpus leads to a paper that (i) does not describe a work on sentiment analysis and (ii) does not mention a corpus assembled from Amazon reviews but from the Wall Street Journal. I cannot check the Chinese corpus, as it would require me to pay for the article. From what I see here I would assume, that at least the Amazon corpus is a self-downloaded corpus.
* There are no significance tests to confirm the improvements, even though the authors report that "our method can SIGNIFICANTLY outperform state‐of‐the‐art methods", that "the CNN model for transfer learning is SIGNIFICANTLY improved" and that "transferring from the Book domain to the DVD domain is SIGNIFICANTLY improved by 5%". Without a test, these statements remain a subjective assessment.
Author Response
Response to Reviewer 1 Comments
Dear Reviewer:
Thank you very much for your kindly comments on our manuscript (No.461702). Based on your and other reviewer’s suggestions, we carefully revised our manuscript.
We are now sending the revised article for your re-consideration to publish in the journal named Information. Please see our point to point responses to all your comments below, and the corresponding revisions in the body of manuscript, both marked in red. We look forward to hearing from you soon for a favorable decision.
Thank you again for your time and consideration.
Sincerely,
Jiana Meng
Mar. 31, 2019
Below you will find our point-by-point responses to your comments:
Point 1: It is not shown how well the machine learners perform intra-domain classification, i.e. k = 2.000 and no samples from the other domain (k is the number of target domain samples included in training). Without this, the bold statements by the authors: "achieves state-of-the-art performance" and "can effectively solve the transfer learning problem" is invalid, because we do not know how well a "pure" model would have worked.
Response 1: we have shown the degree of machine learning execution intra-domain classification in this paper. Table 4 shows the results of the classification within the machine learning intra-domain. You can see of on page 8 of this paper. Moreover, to compare with our transfer learning method, we perform some machine learning methods that no target domain data are used in Table 5 on page 8.
Point 2: The comparison with traditional machine-learning methods only uses a k = 500, even though k=0, k=200, k=1.000, and k=2.000 would also be interesting and necessary for comparison.
Response 2: We have shown the experimental results of k=200, k=1000, k=2000 in this paper. And you can see the results of the experiment in Figure 4 on page 9 of this paper.
Point 3: The numbers for complete cross-domain transferring (k=0) are not convincing. Only after adding training samples from the target domain they get better.
Response 3: The results of the zero sample model transfer experiment have been deleted, and the experimental results of k=2000 are added later. And you can see the results of the experiment in Figure 3 on page 9 of this paper.
Point 4: The results for the traditional machine-learners do not really seem to be much worse than for the CNN.
Response 4: In the last column of Figure 4, we compare the average results. CNN is about 5% higher than the best traditional method. We have modified Figure 4 as Table 6 to show the exact accuracy and represented the results of the experiment. And you can see the results of the experiment in Table 6 on page 8 of this paper.
Point 5: When evaluating on the English Amazon corpus, the authors suddenly use a k=50. This target domain sample size was not used before and it is not made clear why this sudden change is necessary. The same setup with k=0, k=200, ..., would be necessary.
Response 5: The reason of using k=50 in the experiment is to compare the experimental results with Reference [4]. In Reference [4] the experimental target domain sample is 50.
Point 6: The description of the Chinese and English corpora makes it look like they are widely used in the literature. However, the reference for the Amazon corpus leads to a paper that (i) does not describe a work on sentiment analysis and (ii) does not mention a corpus assembled from Amazon reviews but from the Wall Street Journal. I cannot check the Chinese corpus, as it would require me to pay for the article. From what I see here I would assume, that at least the Amazon corpus is a self-downloaded corpus.
Response 6: Here the reference is marked incorrectly, should be the fourth reference in the article, not the thirteenth reference. We have revised it in the revised paper.
Point 7: There are no significance tests to confirm the improvements, even though the authors report that "our method can SIGNIFICANTLY outperform state‐of‐the‐art methods", that "the CNN model for transfer learning is SIGNIFICANTLY improved" and that "transferring from the Book domain to the DVD domain is SIGNIFICANTLY improved by 5%". Without a test, these statements remain a subjective assessment.
Response 7: We can see the best result of the CNN_FT experiment in the second subgraph of Figure 5 on page 10. Transferring from the Book domain to the DVD domain, the best experimental results of DANN is 78.3%, the experimental results of our proposed method is 83.25%, and the improvement of 5% is observed with CNN_FT when compared with the DANN.
Author Response File: Author Response.pdf
Reviewer 2 Report
This is in general an interesting paper discussing parameter-based transfer learning. The paper proposes fine-tuning of the last layer of a CNN trained on shared data to make it domain-adapted. This significantly reduce the amount of supervision needed from the target domain. The method is to some extent consistent with intuitiveness, for most domain differences may be on higher-level language phenomena, but the common features, such as words, phrases are shared. The experiments looks sound and support the idea. Though I believe the following points can still be improved:
The title should emphasize CNN-ft to highlight the strength.
line 16, does not need to re-train. This is a confusing expression and maybe can be deleted.
In related work section 2.2, a large literature of lexicon based methods are missing. The authors could add one more paragraph and refer to articles such as "Sentiment, emotion, purpose, and style in electoral tweets", "Cognitive-inspired domain adaptation of sentiment lexicons".
line 129, a small part: how small? maybe add (k=200, 500, 1000 out of 4000)
figure 2: why source domain is trained with Adam and target domain SGD?
line 208: math symbols are not clear, the use of xs and xt is also confusing, need to explain
in terms of presentation, figure 3 and 4 and 5 should start with 50% accuracy, to clearly should the bar heights; figure 3 should add k=4000, to show how fast the performance is reaching the gold standard. In comparison, methods like NB, LR are less important but can still put there.
figure 5 need an average column?
Author Response
Response to Reviewer 2 Comments
Dear Reviewer:
Thank you very much for your kindly comments on our manuscript (No.461702). Based on your and other reviewer’s suggestions, we carefully revised our manuscript.
We are now sending the revised article for your re-consideration to publish in the journal named Information. Please see our point to point responses to all your comments below, and the corresponding revisions in the body of manuscript, both marked in red. We look forward to hearing from you soon for a favorable decision.
Thank you again for your time and consideration.
Sincerely,
Jiana Meng
Mar. 31 2019
Below you will find our point-by-point responses to your comments:
Point 1: The title should emphasize CNN-ft to highlight the strength.
Response 1: We have revised the title, named “Cross-domain Text Sentiment Analysis Based on CNN_FT Method”.
Point 2: line 16, does not need to re-train. This is a confusing expression and maybe can be deleted.
Response 2: The experiment migrates to the target domain after training the model in the source domain, and only needs to fine-tune the network structure to perform sentiment analysis.
Point 3: In related work section 2.2, a large literature of lexicon based methods are missing. The authors could add one more paragraph and refer to articles such as "Sentiment, emotion, purpose, and style in electoral tweets", "Cognitive-inspired domain adaptation of sentiment lexicons".
Response 3: We have added the corresponding contents in the paper. You can see this in the last paragraph of the second page of the paper.
Point 4: line 129, a small part: how small? maybe add (k=200, 500, 1000 out of 4000)
Response 4: We have added an explanation in the text. You can see the explanation of the third section of the third page of the paper.
Point 5: figure 2: why source domain is trained with Adam and target domain SGD?
Response 5: We don't want to fit in the source domain, so we choose an Adam which is fast, SGD will be more accurate in the target domain, and the training effect will be better.
Point 6: line 208: math symbols are not clear, the use of xs and xt is also confusing, need to explain in terms of presentation, figure 3 and 4 and 5 should start with 50% accuracy, to clearly should the bar heights; figure 3 should add k=4000, to show how fast the performance is reaching the gold standard. In comparison, methods like NB, LR are less important but can still put there.
Response 6: xs represents sentences in the source domain and xt represents sentences in the target domain. And we have added the explanation in the paper. You can see their explanation on the line 175 of the fourth page and the line 209 of the sixth page of the paper. We have modified all the diagrams and they start with 50% accuracy. The target domain data set is a total of 4,000. If k=4000, there is no test data.
Point 7: figure 5 need an average column?
Response 7: We have added the average column in Figure 5. You can see this result in Figure 5 on page 10 of the paper.
Author Response File: Author Response.pdf
Reviewer 3 Report
A particular application of classification problem is proposed in the paper. The authors use for thei aim CNN networks (Convolutional Nweural Network). Moreover they could take into account also the Cellular Nonlinear Networks (CNN) as classifier. The same name, different approach, equal task.
Therefore I invite to include in thereferences the following paper:
IEEE Circuits and Systems MagazineVolume 1, Issue 4, 2001, Pages 6-21
Cellular neural networks: A paradigm for nonlinear spatio-temporal processing(Article)
Fortuna, L.a,
Arena, P.b,
Bâlya, D.a,
Zarândy, A.b
Author Response
Dear Reviewer:
Thank you very much for your kindly comments on our manuscript (No.461702). Based on your and other reviewer’s suggestions, we carefully revised our manuscript.
We are now sending the revised article for your re-consideration to publish in the journal named Information. Please see our point to point responses to all your comments below, and the corresponding revisions in the body of manuscript, both marked in red. We look forward to hearing from you soon for a favorable decision.
Thank you again for your time and consideration.
Sincerely,
Jiana Meng
Point : Therefore I invite to include in the references the following paper:IEEE Circuits and Systems Magazine Volume 1, Issue 4, 2001, Pages 6-21 Cellular neural networks: A paradigm for nonlinear spatio-temporal processing(Article) Fortuna, L.a,Arena, P.bEmail Author,Balya, D.aEmail Author,Zarandy, A.bEmail Author
Response : we have already added this relevant reference in this paper. And you can see it in Reference 10.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
The authors have worked on the issues I suggested and included the necessary points in the paper.
As a last thing I would advise to improve the Conclusion section, even though I do not think it is necessary to resubmit a revised version to get accepted. To my understanding, the conclusion should be a summary of what has been done, the problems that were found, and the results that have been achieved. The present conclusion is rather short.
Author Response
Dear Reviewer:
Thank you very much for your kindly comments on our manuscript (No.461702). Based on your and other reviewer’s suggestions, we carefully revised our manuscript.
Below you will find our point-by-point responses to your comments:
We have appended the analysis of the experiment results and improved the conclusion section in the revised paper, which may be found in Conclusions.
Sincerely,
Jiana Meng
Apr. 13 2019
Reviewer 2 Report
The related work part is incomplete: a large literature of lexicon based methods are missing. The authors could add one more paragraph and add reference to articles such as "Sentiment, emotion, purpose, and style in electoral tweets", "Cognitive-inspired domain adaptation of sentiment lexicons".
Author Response
Dear Reviewer:
Thank you very much for your kindly comments on our manuscript (No.461702). Based on your and other reviewer’s suggestions, we carefully revised our manuscript.
Below you will find our point-by-point responses to your comments:
We have added the lexicon based methods and relevant references in the revised paper, which may be seen in References [25-26].
Sincerely,
Jiana Meng
Apr. 13 2019
Author Response File: Author Response.pdf
Round 3
Reviewer 2 Report
The authors addressed review feedbacks and the content is complete.
I recommend to accept the article after further improvements on English presentation.