Next Article in Journal
Continuous-Time Fast Motion of Explosion Fragments Estimated by Bundle Adjustment and Spline Representation Using HFR Cameras
Previous Article in Journal
Investigation of Deformation Inhomogeneity and Low-Cycle Fatigue of a Polycrystalline Material
 
 
Article
Peer-Review Record

Classification of Full Text Biomedical Documents: Sections Importance Assessment

Appl. Sci. 2021, 11(6), 2674; https://doi.org/10.3390/app11062674
by Carlos Adriano Oliveira Gonçalves 1,2,3,†,‡, Rui Camacho 4,‡, Célia Talma Gonçalves 5,‡, Adrián Seara Vieira 1,2,3,‡, Lourdes Borrajo Diz 1,2,3,‡ and Eva Lorenzo Iglesias 1,2,3,*
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Appl. Sci. 2021, 11(6), 2674; https://doi.org/10.3390/app11062674
Submission received: 8 February 2021 / Revised: 7 March 2021 / Accepted: 10 March 2021 / Published: 17 March 2021

Round 1

Reviewer 1 Report

Overall the manuscript is presented well. I only have 2 minor comments 1) How were the weights assigned for different sections? Will the model performance change if different weights were assigned? 2) How were the hyperparameters of SVM chosen ? Was there any tuning involved?

Author Response

Overall the manuscript is presented well. I only have 2 minor comments

1) How were the weights assigned for different sections?

The chosen weights are experimental combinations done to prove that the term-weighting based on sections affects the performance of the classifiers.

Specifically, 43 different combinations of weights were tested for the 6 sections included in the documents (Title, Abstract, Introduction, Methods, Results and Conclusions) of the 12 corpuses analyzed. Some combinations assign different weights to each section, and others assign different weights to groups of sections. The value of the weights varies in percentages of 20%.

Each section (or combination of sections) was processed as if it were a document, with its own vocabulary, and then weighted according to the percentage assigned in each case.

 

Will the model performance change if different weights were assigned?

Precisely, the objective of the article is to test how the efficiency of the classifier changes if different weights are previously assigned to the sections that compose the documents.

For the corpora chosen in this study (clinical documents taken from PubMed), the results show that the classification improves when the Introduction section is considered, in addition to the Title and Abstract. But this depends on the specific vocabulary of each corpus, and how it is distributed throughout the document.

The Introduction section contributes significantly to increase the performance of the SVM classifier. The combinations that include Title and Abstract with weight 40%, and Introduction with weights between 20% and 40% obtain the best results, which are similar to (in some occasions more efficient than) full-text.

To apply the term-weighting scheme to another corpus, the best combination needs to be calculated experimentally. The automation of this calculation process would undoubtedly improve the system, which is a clear future research line.

 

2) How were the hyperparameters of SVM chosen? Was there any tuning involved?

No tuning of the classifier parameters was performed, but the default values proposed by the SMO algorithm, available in Weka, were applied.

In any case, the objective of this paper is not to test the efficiency of a classification algorithm nor to reach the best values for it, but to check if its efficiency increases by varying the weight of the document sections (and, consequently, of the words composing each section).

To check whether the improvement achieved was exactly due to the modification of the section weights, we avoided optimizing the classifier parameterization.

 

To clarify it, the paragraph of the Abstract:

The main goal of this study is to analyze the efficiency of text classification algorithms when a section weighing scheme is applied. The scheme takes into account the place (section) where terms are located in the document, and each section has a weight that can be modified depending on the corpus. To carry out the study, an extended version of the OHSUMED corpus with full documents have been created. Through the use of WEKA, we compared the use of abstracts only with that of full texts, as well as the use of section weighing combinations to assess their significance in the scientific article classification process.”

has been replaced by

The objective of this study is to test how the efficiency of the text classification changes if different weights are previously assigned to the sections that compose the documents. The proposal takes into account the place (section) where terms are located in the document, and each section has a weight that can be modified depending on the corpus.

To carry out the study, an extended version of the OHSUMED corpus with full documents have been created. We compared the use of abstracts only with that of full texts, as well as the use of section weighing combinations to assess their significance in the scientific article classification process using the SMO (Sequential Minimal Optimization), the WEKA Support Vector Machine (SVM) algorithm implementation.”

Reviewer 2 Report

This paper is a study that performed full text document classification on biomedical documents. The method of classifying documents by section weighting scheme is meaningful.

Recently, there have been many studies on document classification using deep learning. There is no relevant study of deep learning-based document classification in this paper, nor is there a comparative experiment. It would be good to supplement the reasons excluding deep learning methods and the reasons for using relatively old SVMs.

The idea proposed is too simple. Furthermore, there is too little description of the core idea.

The methods used in this paper are a little old. There are no recent papers in the references, and they are all from three years ago. 

 

Author Response

Attached

Author Response File: Author Response.pdf

Reviewer 3 Report

The authors analyzed the efficiency of text classification taking into account section weighing scheme. It is an up-to-date and interesting research topic.

The abstract states "The main goal of this study is to analyze the efficiency of text classification algorithms" what might suggest that there will be more algorithms; however, only the SMO classifier was analyzed. Thus, I suggest changing the abstract and maybe even be more specific in the paper's title.

It would look much better if the last paragraph of the introduction introduces the paper's whole structure, not just selected sections (2, 3.2, and 6).

The topic of the paper seems to be pretty hot, e.g., there is a survey from 2019 which got over 200 citations: Kowsari, Kamran, et al. "Text classification algorithms: A survey." Information 10.4 (2019): 150. But in this context, it is quite surprising that among the references, there are only 6 positions from the last five years, and there is no reference to any source from the previous two years. I understand that probably there might not be many sources concerning biomedical documents, but maybe other types of documents should be taken into consideration.

Some papers which I found concerning the importance of sections:

- Habib, R., & Afzal, M. T. (2019). Sections-based bibliographic coupling for research paper recommendation. Scientometrics, 119(2), 643-656.

- Li, T., & Lepage, Y. (2019). Informative sections and relevant words for the generation of NLP article abstracts. In AMJANLP (pp. 1281-1284).

- Collins, E., Augenstein, I., & Riedel, S. (2017). A supervised approach to extractive summarisation of scientific papers. arXiv preprint arXiv:1706.03946.

As the authors had conducted related research with two experiments (titles and abstract vs. full text) using the same OHSUMED corpora, I suggest adding some brief comparison of this research to the previous one (which is indeed different but related): Gonçalves, C. A., Iglesias, E. L., Borrajo, L., Camacho, R., Vieira, A. S., & Gonçalves, C. T. (2019, May). Comparative study of feature selection methods for medical full text classification. In International Work-Conference on Bioinformatics and Biomedical Engineering (pp. 550-560). Springer, Cham.

In Table 3, maybe it could be useful to highlight somehow the best weighing combination for each corpus or create a heatmap that would clearly show the best combinations. 

In some of the corpora, e.g. c14, only few combinations reached kappa 0.6. It could be worth investigating if there is any difference between c14 and other corpora, such as  

In Table 1, 26 corpora are presented, but in Table 3, the results for only 12 of them are presented. If there was any selection process, it should be described. I can imagine that c21 or c24 can be excluded as there are not many relevant documents there, but what about c23?

Based on your research, is it possible to rank each individual section in terms of its usefulness in the classification process?

I also suggest adding in the discussion a description of the limitations of the presented research, maybe examples of the unsuccessful classification and its reasons. 

Minor issues:

  • p. 1 "commnunity" => "community",
  • p. 1 "enriched datasets with text from certain sections achieves" => "enriched datasets with text from certain sections achieve",
  • The reference to [26] should be: Westergaard, David, et al. "A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts." PLoS computational biology 14.2 (2018): e1005962.

 

Author Response

The authors analyzed the efficiency of text classification taking into account section weighing scheme. It is an up-to-date and interesting research topic.

The abstract states "The main goal of this study is to analyze the efficiency of text classification algorithms" what might suggest that there will be more algorithms; however, only the SMO classifier was analyzed. Thus, I suggest changing the abstract and maybe even be more specific in the paper's title.

To clarify it, the paragraph

The main goal of this study is to analyze the efficiency of text classification algorithms when a section weighing scheme is applied. The scheme takes into account the place (section) where terms are located in the document, and each section has a weight that can be modified depending on the corpus. To carry out the study, an extended version of the OHSUMED corpus with full documents have been created. Through the use of WEKA, we compared the use of abstracts only with that of full texts, as well as the use of section weighing combinations to assess their significance in the scientific article classification process.”

has been replaced by

The objective of this study is to test how the efficiency of the text classification changes if different weights are previously assigned to the sections that compose the documents. The proposal takes into account the place (section) where terms are located in the document, and each section has a weight that can be modified depending on the corpus.

To carry out the study, an extended version of the OHSUMED corpus with full documents have been created. We compared the use of abstracts only with that of full texts, as well as the use of section weighing combinations to assess their significance in the scientific article classification process using the SMO (Sequential Minimal Optimization), the WEKA Support Vector Machine (SVM) algorithm implementation.”

 

It would look much better if the last paragraph of the introduction introduces the paper's whole structure, not just selected sections (2, 3.2, and 6).

Thank you for the thorough review. Indeed, the reference to the sections was incorrect.

The paragraph has been replaced by

The rest of the paper is organized as follows. Section 2 presents the model to assign the weight of sections in the documents and calculate the weight of the terms in the sections. Section 3 describes the corpus used in the study and the text pre-processing techniques applied to the original data. Section 4 reports on the experiments made, Section 5 shows the main results achieved and, finally, Section 6 presents the most relevant conclusions of the study.”

 

The topic of the paper seems to be pretty hot, e.g., there is a survey from 2019 which got over 200 citations: Kowsari, Kamran, et al. "Text classification algorithms: A survey." Information 10.4 (2019): 150. But in this context, it is quite surprising that among the references, there are only 6 positions from the last five years, and there is no reference to any source from the previous two years. I understand that probably there might not be many sources concerning biomedical documents, but maybe other types of documents should be taken into consideration.

Some papers which I found concerning the importance of sections:

- Habib, R., & Afzal, M. T. (2019). Sections-based bibliographic coupling for research paper recommendation. Scientometrics, 119(2), 643-656.

- Li, T., & Lepage, Y. (2019). Informative sections and relevant words for the generation of NLP article abstracts. In AMJANLP (pp. 1281-1284).

- Collins, E., Augenstein, I., & Riedel, S. (2017). A supervised approach to extractive summarisation of scientific papers. arXiv preprint arXiv:1706.03946.

Thank you for your comments. We have included in the state of the art the three suggested bibliographic references and two others published in 2020 together with the next explanation:

There are several papers which analyze the importance of sections. Habib and Afzal [ref] develop a method that allow to recommend scientific papers similar to another paper, giving a weight to the references based on their position within the sections of the paper.

In [ref], the authors study how different sections in scientific papers contribute to a summary and determine that there isn’t a definitive section from which summary sentences should be extracted.

Li and Lepage [ref] introduce a method which makes use of only some sections to generate a summary, and show that the Introduction and the Conclusion are the most useful sections to generate accurate abstracts.

Thijs [ref] proposes the use of a neural network architecture for word and paragraph embeddings (Doc2Vec) for the measurement of similarity among those smaller units of analysis. It is shown that paragraphs in the Introduction and the Discussion Section are more similar to the abstract, that the similarity among paragraphs is related to -but not linearly- the distance between the paragraphs. The Methodology Section is least similar to the other sections.

In the end, Hebler et al [ref] provide recent results on the number of paragraphs (pars.) per section used in articles published in major medical journals, and investigate other structural elements (number of tables, figures and references and the availability of supplementary material). The authors conclude that papers should be composed by the standard IMRAD (Introduction, Methods, Results And Discussion) structure to increase the likelihood for publication.”

The objective of our paper is different from these approaches. The authors do not use any section weighing scheme. Our objective is to determine that combinations of weighted sections allow to improve the classification of documents with full text.

 

As the authors had conducted related research with two experiments (titles and abstract vs. full text) using the same OHSUMED corpora, I suggest adding some brief comparison of this research to the previous one (which is indeed different but related): Gonçalves, C. A., Iglesias, E. L., Borrajo, L., Camacho, R., Vieira, A. S., & Gonçalves, C. T. (2019, May). Comparative study of feature selection methods for medical full text classification. In International Work-Conference on Bioinformatics and Biomedical Engineering (pp. 550-560). Springer, Cham.

In the mentioned work, the authors present LearnSec, a framework for full-text analysis developed to improve the classification process with propositional and relational learning. It allows preprocess a document corpus, generate an attribute/value dataset in Weka format for relational learning, and a First Order Logic dataset in Inductive Logic Programming (ILP) format for propositional learning. Among the processing techniques it includes the technique of term-weighting based on sections introduced in this paper.

 

In Table 3, maybe it could be useful to highlight somehow the best weighing combination for each corpus or create a heatmap that would clearly show the best combinations. 

The best weighing combination for each corpus has been highlighted.

 

In some of the corpora, e.g. c14, only few combinations reached kappa 0.6. It could be worth investigating if there is any difference between c14 and other corpora, such as  

Indeed, there is a significant difference in the results obtained for the C14 corpus. We have been investigating whether this could be due to the size of the corpus, the size of the vocabulary or the distribution of the terms in the document, but we have not found the reason.

 

In Table 1, 26 corpora are presented, but in Table 3, the results for only 12 of them are presented. If there was any selection process, it should be described. I can imagine that c21 or c24 can be excluded as there are not many relevant documents there, but what about c23?

There was no selection; the experiments were performed on all 26 corpora.

Table 3 presents the corpora where the SMO classifier obtained a Kappa value higher than 60 at least once.

This explanation was included in the article (lines 286 and 287).

 

Based on your research, is it possible to rank each individual section in terms of its usefulness in the classification process?

For the analyzed corpus, the Introduction section contributes significantly to increase the performance of the SVM classifier. The combinations that include Title and Abstract with weight 40%, and Introduction with weights between 20% and 40% obtain the best results, which are similar to (in some occasions more efficient than) full-text.

This explanation was included in the article, in the Discussion section.

 

I also suggest adding in the discussion a description of the limitations of the presented research, maybe examples of the unsuccessful classification and its reasons. 

The main limitation of the present research is that to apply the term-weighting scheme to another corpus, the best combination needs to be calculated experimentally.

The automation of the weighing calculation process for each section would undoubtedly improve the system, allowing a better fit depending on each specific corpus, which is a clear future research line.

This explanation was included in the article, in the Discussion section.

 

 

Minor issues:

  • 1 "commnunity" => "community",
  • 1 "enriched datasets with text from certain sections achieves" => "enriched datasets with text from certain sections achieve",
  • The reference to [26] should be: Westergaard, David, et al. "A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts." PLoS computational biology 14.2 (2018): e1005962.

 

They have been corrected in the manuscript.

Round 2

Reviewer 2 Report

Figure 1 shows that the label is too small to be seen. I think it would be good to change it to a different chart form or grow a font.

Reviewer 3 Report

Dear Authors,

thank you for correcting the manuscript. In my opinion, the added sections have significantly increased the benefits of the article, and it can now be recommended for acceptance.

All the best,
Revievier

Back to TopTop