Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Classification of Left and Right Coronary Arteries in Coronary Angiographies Using Deep Learning

Electronics 2022, 11(13), 2087; https://doi.org/10.3390/electronics11132087

by Christian Kim Eschen^1,*

, Karina Banasik¹

, Alex Hørby Christensen^2,3

, Piotr Jaroslaw Chmura¹

, Frants Pedersen^2,3, Lars Køber^2,3

, Thomas Engstrøm^2,3

, Anders Bjorholm Dahl⁴

, Søren Brunak¹

and Henning Bundgaard^2,3

Reviewer 1:

Chih-Hsien Huang

Reviewer 2:

Bofan Song

Reviewer 3:

Qiang Lai

Electronics 2022, 11(13), 2087; https://doi.org/10.3390/electronics11132087

Submission received: 1 June 2022 / Revised: 24 June 2022 / Accepted: 1 July 2022 / Published: 3 July 2022

(This article belongs to the Special Issue Deep Learning for Medical Images: Challenges and Solutions)

Round 1

Reviewer 1 Report

The work is completed and comprehensive with convincing results. Would the author provide some comments about integrating this research into an actual diagnostic situation?

Author Response

Response to Reviewer 1 Comments:

Dear reviewer

Thank you for taking the time to review this manuscript.

Point 1:

Comment

Response

Change in text

The work is completed and comprehensive with convincing results. Would the author provide some comments about integrating this research into an actual diagnostic situation?

This is an excellent point.

The findings presented in this paper can be used to prepare and curate CAG videos. The first important task is to distinguish between left coronary artery (LCA) and right coronary artery (RCA). After this distinguishment between LCA and RCA it is possible to develop a model for automatic and quantitative stenosis assessment individually for left coronary artery (LCA) and right coronary artery (RCA). This model must be trained and developed before integrating this research into a diagnostic situation (future work). After successfully training and developing the models, the models can be deployed in a diagnostic situation. It is important to classify LCA and RCA in a diagnostic situation to trigger the correct model for LCA and RCA.

We have tried to clarify the impact of this research and how to integrate it into a diagnostic situation.

Please see line 29-46, 60-61, 72-74, 86-89, and please see Figure 1.

Author Response File: Author Response.pdf

Reviewer 2 Report

Major concerns:

The authors used several different deep learning models to classify LCA and RCA. This is a typical classification task. The technical innovation overall is very limited. Therefore, it is important to describe the clinical significance of this task. I didn’t think the authors explained the clinical importance of this classification task clearly. The authors should revise this part carefully, otherwise the significance of this work is very unclear.

Although the authors were using the existing model architectures, these models' information should be described in more detail.

In line 167, the author said, ‘using the previously described preprocessing in section 2.3’. However, the paper doesn’t have section 2.3.

Minor:

Line 176 ‘non-linear relationship between the scanner parameters and LCA and RCA’. The expression is a bit strange. Between the parameters and the performance? That definitely not a linear relationship.

The authors should carefully check the writing.

Author Response

Response to Reviewer 2 Comments:

Dear reviewer

Thank you for taking the time to review this manuscript.

Point 1:

Comment	Response	Change in text
The authors used several different deep learning models to classify LCA and RCA. This is a typical classification task. The technical innovation overall is very limited. Therefore, it is important to describe the clinical significance of this task. I didn’t think the authors explained the clinical importance of this classification task clearly. The authors should revise this part carefully, otherwise the significance of this work is very unclear.:	We can see the point of being very specific and clarifying the impact of this research.	Please see line 29-46, 60-61, 72-74, 86-89, and please see Figure 1.

Point 2:

Comment	Response	Change in text
Although the authors were using the existing model architectures, these models' information should be described in more detail.	Thank you for pointing this out. We acknowledge that the descriptions of the model architectures can be improved.	Please see lines 119-120, 132-141, 145-153, 156-163 and Table 1.

Point 3:

Comment	Response	Change in text
In line 167, the author said, ‘using the previously described preprocessing in section 2.3’. However, the paper doesn’t have section 2.3.	We apologize. This has been changed.	Please see line 191.

Point 4:

Comment

Response

Change in text

We agree and we have now removed the comment about the non-linear relationship between parameters and performance.

Please see line 199.

Author Response File: Author Response.pdf

Reviewer 3 Report

his paper proposes a method for classifying left and right coronary arteries using deep learning networks, and has been tested on multiple datasets, all of which have achieved good results and have certain practical application value. However, the main network framework of the article is not clear enough in Figure 1, the feature layers of the deep learning network are not shown, and there is not much description in the following text. The main problem of this article is that the proposed framework is not clearly expressed. After careful review, the following review comments are put forward:

1. In the data set setting section, the data set is divided into training set and validation set in a ratio of four to one. Please explain the significance of this ratio.

2. Why does the baseline section choose to use two MLP overlays, ignoring more mature methods such as SVM.

3.In the second part, models such as R(2+1)D and X3D are also mentioned. What is the role of these models in the overall framework, and how are they stacked with the MLP structure mentioned above.

4. In the experimental results section, the results obtained by the MVIT-32 model have large fluctuations compared with the results of other models. Please explain this.

Author Response

Response to Reviewer 3 Comments:

Dear reviewer

Thank you for taking the time to review this manuscript.

Point 1.

Comment

Response

Change in text

This paper proposes a method for classifying left and right coronary arteries using deep learning networks, and has been tested on multiple datasets, all of which have achieved good results and have certain practical application value. However, the main network framework of the article is not clear enough in Figure 1, the feature layers of the deep learning network are not shown, and there is not much description in the following text. The main problem of this article is that the proposed framework is not clearly expressed. After careful review, the following review comments are put forward:

Thank you for pointing this out. We have improved the description of the framework.

We agree that visualizing the feature maps learned by the deep learning models would be interesting. However, it is difficult in practice, as deep learning models learn thousands of feature maps from low-level features (blobs and edges) to higher-level features. So it would be challenging to pick the correct feature maps, which are of importance for the decisions made by the model. There exist methods for visualizing feature maps of significance for the classification task. These are known as class activation maps, e.g.," GradCam" [1]. However, these methods are not perfect and are not guaranteed to look at the same features as humans [2], although these models have high predictive performance.

So if we calculated the "GradCam", we could not validate the features. Therefore, we decided not to present this.

[1] Selvaraju, Ramprasaath R., et al. "Grad-cam: Visual explanations from deep networks via gradient-based localization." Proceedings of the IEEE international conference on computer vision. 2017.

[2] Arun, Nishanth, et al. "Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging." Radiology: Artificial Intelligence 3.6 (2021).

Please see line 29-46, 60-61, 72-74, 86-89, and please see Figure 1.

Point 2.

Comment

Response

In the data set setting section, the data set is divided into training set and validation set in a ratio of four to one. Please explain the significance of this ratio.

We constructed two datasets. The first dataset contains 3,545 videos used for training/validation. This dataset was split into 80% training and 20% validation, giving 2,836 videos for training and 709 videos for validation. It is important to have a greater portion for training, as it we need enough training data to learn the data. The validation set is only used for development and circumventing overfitting. We choose to use 80% for training and 20% for validation as it is common practice in machine learning [3].

The second dataset is used for testing, meaning we do not make any model selection based on this dataset. We only report results on this dataset (the dataset is unbiased). This dataset contains 520 videos.

[3] Gholamy, Afshin, Vladik Kreinovich, and Olga Kosheleva. "Why 70/30 or 80/20 relation between training and testing sets: a pedagogical explanation." (2018).

Point 3.

Comment

Response

Change in text

Why does the baseline section choose to use two MLP overlays, ignoring more mature methods such as SVM.

Thank you for this comment. We have now added a SVM model to the baseline section.The baseline experiments aim to highlight the performance gains using linear vs. non-linear models with the scanner parameters as inputs. In other words, we would like to test the predictive power of the scanner parameters without using the videos as inputs. For the non-linear models, we choose neural networks with one hidden layer and two hidden layers.

However, we choose neural networks as these are pretty flexible models. With a simple shallow neural network (e.g. 2, layer neural networks), we can approximate almost any function (particularly those of interest for modeling data from tabular data). Based on these observations, we believe that a one-layer neural network and a two-layer neural network are well suited as baselines.

Please see line 127-129 and Table 2.

Point 4.

Comment

Response

Change in text

In the second part, models such as R(2+1)D and X3D are also mentioned. What is the role of these models in the overall framework, and how are they stacked with the MLP structure mentioned above.

The MLP structure mentioned above is only used for the baseline. We experimented with three different model architectures to investigate which settings were important for training a LCA and RCA video classification model. Another reason for choosing three different model architectures was to show the robustness between architectures and not to be biassed towards one specific model. We have improved the description of the models used for classifying coronary angiography videos.

The MLP baseline was only used to explore the predictive performance using scanner parameters only (no videos are used as input).

Please see lines 119-120, 132-141, 145-153, 156-163 and Table 1.

Point 5.

Comment

Response

Change in text

In the experimental results section, the results obtained by the MVIT-32 model have large fluctuations compared with the results of other models. Please explain this.

The fluctuations observed in Figure 2 are probably caused by the fact that Vision Transformer models tend to require more data than CNN models. Thus, pretraining might be crucial for good performance.

This pattern is also observed by the inventor of the Vision Transformer (Dosovitskiy et al.) [4]. Dosovitskiy et al. argue that when training on a mid-size or small-size dataset, the CNN models are superior to Vision transformers since CNN models introduce inductive bias such as translation equivariance and locality. However, this picture changes if the vision transformer has been pretrained on a much larger dataset and then applied to the same mid-size or small-size dataset. The hypothesis is that the vision transformer can learn the same kind of inductive bias from the data itself.

[4] Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).

Please see line 229-233.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The authors have address concerns

Reviewer 3 Report

My concerns have been addressed.

Article Menu

Classification of Left and Right Coronary Arteries in Coronary Angiographies Using Deep Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI