Breast Lesions Screening of Mammographic Images with 2D Spatial and 1D Convolutional Neural Network-Based Classifier
Round 1
Reviewer 1 Report
In summary: Introduction is too long. There is not a clear objetive. No well organised. Table 1 must be self-explanatory. This table must explain the complete results. It is not clear the improvement over what is already done. No limitations in the Discussion. No comparison with other studies. Conclusion must be rewritten.
Author Response
For Reviewer: #1
In summary: Introduction is too long. There is not a clear objetive. No well organized. Table 1 must be self-explanatory. This table must explain the complete results. It is not clear the improvement over what is already done. No limitations in the Discussion. No comparison with other studies. Conclusion must be rewritten.
Response: Thank you for reminding us. Some sentences have been added in Introduction (Page#4), Section 2.4. (Page#10), Section 3.1. (Page#11), Section 3.3. (Pages#15 - #17), and Conclusion (Page#18).
- I am not able to understand the meaning of the figure 1 (features), what the authors want to show with this figure. The paragraphs from 50 to 66 need a revision by an expert radiologist, too.
Response: Thank you for reminding us. Some sentences have been added in Introduction, Page#2.
Introduction, Page#2
… Hence, some studies [6, 9-12] extract feature patterns with specific bounding box from suspicious mammographic images, as original template patterns of normal (Nor), benign (B), and malignant (M), as shown in Figure 1. It uses morphological features to identify the normality or abnormality for automatic breast tumor screening. …
- There are spelling errors.
Response: Thank you for reminding us. Correct as suggestion.
- Methodology. Authors have used the MIAS database, and this is ok, but also 118 mammograms, from where?
Response: Thank you for reminding us. Some sentences have been added in Section 2.1., Page#5.
Section 2.1., Page#5
… We collect the digital mammographic images from the MIAS image database (v1.21, 2015), including the original 322 images (161 pairs, including right and left images) at a spatial resolution of 50 mm2 pixel edge with a linear response in the optical density range of 0.0−3.2 at 8 bits / pixel in portable gray map format [22, 23]. Overall image filenames consist of three-digit serial number, l or r for left and right breast, respectively, and s, m, l, and x denote the image sizes. The most common image size is 4,320 pixels × 2,600 pixels, which is selected for proposed study in breast tumor screening. The clinical information is confirmed by expert radiologists for biomarkers, such as image size, image category, background tissue (fatty, fatty glandular, and dense glandular), class of abnormality (calcification, masses, asymmetry, architectural distortion, and Nor), severity of abnormality (B and M classes), and location of center of abnormality [22, 23]. A total of 59 subjects (35 normal subjects and 24 abnormal subjects) with 118 mammographic images (59 pairs, including right and left images) are selected for experimental verification. According to the abnormality location, the ROI of each image can be extracted with a 100 ´ 100 bounding box, around the center, and a total of 500 feature patterns (200 Nors, 150 Bs, and 150 Ms) can be extracted from the 118 images, as seen in the feature templates in Figure 1, which are available for further training and validating the classifier at learning and recalling stages. …
Author Response File: Author Response.pdf
Reviewer 2 Report
This paper proposes a method based on 1D and 2D deep neural networks for the detection and classification of breast lesions. Another goal seems to be the identification of potential sites for these tumours to occur. In general, the paper needs to be reviewed by the authors.
Below I provide some more detailed comments:
- A substantial review of the text is necessary. There are several mistakes like "Breast mammography", "the results will indicate", "we will collect the data". Moreover, the organization doesn`t follow the traditional sections used in scientific papers. The information is all scattered over sections.
- The paper also lacks a proper literature review.
- The dataset used is not clearly explained: the number of samples per class; the number of samples per lesion...
- The ROIs in Fig. 1 have a very bad contrast. The authors could enhance them just for visualization and mention this processing in the figure's caption.
- The BI-RADS description could be in a table for better visualization.
- Fig. 8 should report the Accuracy of Training and Validation, the same goes for the Loss.
- The paper doesn't present what is proposed. Where is evaluated the potential sites for lesions? The classification's results are into two classes: normal and abnormal. What they would be? Malignant and Benign? The authors have to explain this and review Fig. 4 accordantly.
- The results mentioned in conclusion section doesn't make sense: "F1 score (>0.95), precision (%), index (>95%), and recall (%) index 549 (>95%)". What would be the "index"?
Author Response
For Reviewer: #2
This paper proposes a method based on 1D and 2D deep neural networks for the detection and classification of breast lesions. Another goal seems to be the identification of potential sites for these tumors to occur. In general, the paper needs to be reviewed by the authors. Below I provide some more detailed comments:
Response: Thank you for reviewers’ comments. The point-to-point responses to all the referees are shown below.
- A substantial review of the text is necessary. There are several mistakes like "Breast mammography", "the results will indicate", "we will collect the data". Moreover, the organization doesn`t follow the traditional sections used in scientific papers. The information is all scattered over sections.
Response: Thank you for reminding us. Correct as suggestion. Some sentences have been modified.
- The paper also lacks a proper literature review.
Response: Thank you for reminding us. Some references have been added in Introduction and Section 3.3., Pages#3-4 and #17.
Introduction, Pages#3-4
… The deep CNN-based methods combine the multiconvolutional pooling layers (>10 layers in general configuration) and a classification layer to perform the automatic end-to-end enhancement process, noise filtering, feature extraction, and pattern recognition in this proposed topic, such as amass classification, lesion detection and localization, and lesion segmentation/ROI detection, by using fully convolutional network (FCN), Unet CNN, region-based CNN (R-CNN), faster R-CNN, TTCNN (transferable texture convolutional neural network), and Grad-CAM (Gradient-Weighted Class Activation Mapping) based CNN [31-39]. …
… The MP will select the brighter pixel values from the image in a specific pooling mask; hence, the dimension of the feature patterns can be effectively reduced and thereby overcoming the overfitting problems in training tasks with too much training dataset [40-42]. The ROI is proved using the Grad-CAM, which also replaces the conventional lighting with a fully linked layer that uses global average pooling (GAP) [39]. The feature patterns are then obtained by activating the Rectified Linear Unit (ReLU) utilizing the summation and multiplication of feature patterns, respectively, using the GAP. [43]. Therefore, the classification accuracy of these multilayer structures can be improved for digital image classification. However, the limitations of the multilayer classifier are determining the number of convolutional–pooling layers, the number of convolution kernels, and the sizes of convolutional masks for setting the structure of convolutional–pooling layers. Also, too many multiconvolutional pooling processes will result in the spatial and edge information loss, and have no use for the key feature extraction. …
Section 3.3., Page#17
… For example, TTCNN [38] comprised two convolutional layers (with 5 ´ 5 Kernel mask size, 16 and 32 Kernel masks) and followed by the MP layers (with 2 ´ 2 mask size, 16 and 32 MP masks), the 3rd convolutional layer (with 3 ´ 3 Kernel mask size, 64 Kernel masks), and the fully connected layer (classification layer). Hence, these multi convolutional–pooling layers were used to select the DP features for improved image contrast (contrast adjustment), which limited the size of output patterns, and refined the classifier’s recognition ability. However, the model’s purpose and performance were required to continuously maintain the available training dataset, and the excessive multi convolutional–pooling processes would decrease the position, orientation, and spatial relationships of the desired object. ML-based methods could be rapidly established but might be …
- Sarmad Maqsood, Robertas Damaševiˇcius, and Rytis Maskeliunas, “TTCNN: A breast cancer detection and classification towards computer-aided diagnosis using digital mammography in early stages, ”Applied Sciences, vol.12, 2022, pp. 1-17.
- Yong Joon Suh, Jaewon Jung, and Bum-Joo Cho, “Automated breast cancer detection in digital mammograms of various densities via deep learning, ”Journal of Presonalized Medicine, vol. 10, no. 211, 2020, pp. 1-11.
- R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization, arXiv 2016, arXiv:1610.02391.
- The dataset used is not clearly explained: the number of samples per class; the number of samples per lesion.
Response: Thank you for reminding us. Some sentences have been added in Section 2.1., Page#5.
Section 2.1., Page#5
… We collect the digital mammographic images from the MIAS image database (v1.21, 2015), including the original 322 images (161 pairs, including right and left images) at a spatial resolution of 50 mm2 pixel edge with a linear response in the optical density range of 0.0−3.2 at 8 bits / pixel in portable gray map format [22, 23]. Overall image filenames consist of three-digit serial number, l or r for left and right breast, respectively, and s, m, l, and x denote the image sizes. The most common image size is 4,320 pixels × 2,600 pixels, which is selected for proposed study in breast tumor screening. The clinical information is confirmed by expert radiologists for biomarkers, such as image size, image category, background tissue (fatty, fatty glandular, and dense glandular), class of abnormality (calcification, masses, asymmetry, architectural distortion, and Nor), severity of abnormality (B and M classes), and location of center of abnormality [22, 23]. A total of 59 subjects (35 normal subjects and 24 abnormal subjects) with 118 mammographic images (59 pairs, including right and left images) are selected for experimental verification. According to the abnormality location, the ROI of each image can be extracted with a 100 ´ 100 bounding box, around the center, and a total of 500 feature patterns (200 Nors, 150 Bs, and 150 Ms) can be extracted from the 118 images, as seen in the feature templates in Figure 1, which are available for further training and validating the classifier at learning and recalling stages. …
- The ROIs in Fig. 1 have a very bad contrast. The authors could enhance them just for visualization and mention this processing in the figure's caption.
Response: Thank you for reminding us. Some sentences have been added in Introduction, Pages#2 and #4.
Introduction, Pages#2 and #4
… Hence, some studies [6, 9-12] extract feature patterns with specific bounding box from suspicious mammographic images, as original template patterns of normal (Nor), benign (B), and malignant (M), as shown in Figure 1. It uses morphological features to identify the normality or abnormality for automatic breast tumor screening. …
… The II process performs the spatial convolutions by using the summed area table (SAT) [44-46] to detect the line and diagonal edge features. The II based convolutional process doesn’t require the convolutional mask’s parameters and sizes. Therefore, after the 2D spatial convolutional process, via image enhancement (adjust the contrast and maintain the feature), the possible lesion can be easily detected and located in a ROI with the specific bounding box, the “feature pattern” can be easily picked out from the original mammographic image. Then, converting a 2D feature pattern into a “1D feature vector” by the flattening process, the multi-round 1D convolutional processes subsequently enhance the incoming feature vector as feature signals, which can also increase the significant characteristics for further feature extraction and classification application. …
- The BI-RADS description could be in a table for better visualization.
Response: Thank you for reminding us. Correct as suggestion. Table 1 has been added in Introduction, Page#3.
... The morphological descriptors of Breast Imaging-Reporting and Data System (BI-RADS). The seven assessment categories are used to characterize lesions [11, 16, 17], and the assessment of BI-RADS categories for mammogram classification is shown in Table 1. …
- Fig. 8 should report the Accuracy of Training and Validation, the same goes for the Loss.
Response: Thank you for reminding us. Some sentences have been added in Section 3.2., Page#13.
Section 3.2., Page#13
… It could be seen that the training history curve could reach the saturation over the 400 training epochs in the training stage; thus, a classification accuracy of 97% could be obtained and could guarantee to gradually reach the convergence condition. Finally, the results of the training convergence curve converged, and the value of loss function was 0.091, as seen in the training convergence curve in Figure 8(b). …
- The paper doesn't present what is proposed. Where is evaluated the potential sites for lesions? The classification's results are into two classes: normal and abnormal. What they would be? Malignant and Benign? The authors have to explain this and review Fig. 4 accordantly.
Response: Thank you for reminding us. Some sentences have been added in Introduction and Section 2.4, Pages#4 and #9-#10.
Introduction, Page#4
… The II process performs the spatial convolutions by using the summed area table (SAT) [44-46] to detect the line and diagonal edge features. The II based convolutional process doesn’t require the convolutional mask’s parameters and sizes. Therefore, after the 2D spatial convolutional process, via image enhancement (adjust the contrast and maintain the feature), the possible lesion can be easily detected and located in a ROI with the specific bounding box, the “feature pattern” can be easily picked out from the original mammographic image. Then, converting a 2D feature pattern into a “1D feature vector” by the flattening process, the multi-round 1D convolutional processes subsequently enhance the incoming feature vector as feature signals, which can also increase the significant characteristics for further feature extraction and classification application. …
… The proposed 1D convolutional operations use the simpler linear weighted mathematical sums to deal with the incoming subsequent feature signals and can remove unwanted noises. Additionally, the 1D kernel convolutional process can quantify the difference levels in feature signals for separating Nor from B and M classes. In real-time application, this simple architectural can be easy to implement for the intended medical purpose. In the classification layer, feed the 1D feature pattern into the input layer of gray relation analysis (GRA)-based classifier [49-50], the mammographic classification of breast lesions can be identified, including Nor, B, and M classes. In experimental validations, mammographic images are collected from the MIAS database [22-23], including training and testing datasets for training the proposed classifier and validating classifier performances in clinical applications. Using K-fold cross-validation, the experimental results will show the classifier’s efficiency for automatic breast lesions screening, with precision (%), recall (%), accuracy (%), and F1 score indices [40-42]. …
Section 2.4, Pages#9-#10
… where parameters, wkj, are the connected weighting values as the desired class referring to the input feature signal between the GRA layer and summation layer, which can be set by K ´ 3 (m = 3, three classes in this study) output training data, as encoded by value “1” or value “0”; the output pattern vector for three classes can be encoded as [Nor, B, M] = [0/1, 0/1, 0/1] and binary encoding as (1) Class-Nor: [1, 0, 0], (2) Class-B: [0, 1, 0], and (3) Class-M: [0, 0, 1]. The classifier’s output, Yj, j = 1 ,2, 3. can be decided by the threshold value at “value 0.5” to identify the disease present (for value 1) or disease absent (for value 0). Hence, we can perform our medical purpose to establish a classifier for automatic multi-label classification, consisting of II based convolutional process in the 1st convolutional layer; two 1D convolutional processes in the 2nd and 3rd convolutional layers (with discrete Gaussian mask, data length = 200, and stride = 1); 1D pooling layer (stride = 100); and GRA-based classifier in classification layer, as seen in Table 2. As seen in Figure 7, the flowchart of the classifier’s testing and validation includes the image enhancement and noise denoizing with II-based spatial convolutional process, feature pattern extraction, flattening process, two-round 1D convolutional process, 1D pooling process, breast lesions screening, and keeping its medical purpose in clinical application. …
- The results mentioned in conclusion section doesn't make sense: "F1 score (>0.95), precision (%), index (>95%), and recall (%) index 549 (>95%)". What would be the "index"?
Response: Thank you for reminding us. The sentence has been modified in Conclusion, Page#18.
Conclusion, Page#18
… Through 10-fold cross-validation tests, we obtained the promising results for screening breast lesions, as a high classifier’s mean F1 score (0.9641), precision (96.70%), and recall (96.13%) for separating the Nor from the B and M classes. The proposed screening model has overcome the limitations, such as parameters assignment, parameters adjustment, iteration computation, and optimization algorithm requirement. Its training scheme has adaptive capability to retrain the classifier with the new image dataset in less computation time, such as clinical images, DDSM database, or INbreast database, which can continuously maintain its intended medical purpose and can also establish a pattern recognition scheme as software in a medical device (SaMD) tool. Therefore, we suggest the proposed screening model could replace the manual inspection manner and traditional CNN methods for specific tasks requiring expertise and experience in medical image examinations. …
Author Response File: Author Response.pdf
Reviewer 3 Report
The paper proposes a two-dimension (2D) spatial and one-dimension (1D) convolutional neural network (CNN) for the classification of breast lesions using mammogram images. The proposed approach is evaluated on a benchmark MIAS dataset of mammography images. The paper needs to be revised and improved to address the suggestions and comments presented below, before the paper could be considered for publication.
1. The authors need to explicitly state their novelty and differences from the previous studies. What are your contributions to the research field?
2. The discussion on the related works section lacks of focus. Several works using machine learning and deep learning techniques are discussed, but without any particular order or structure, which makes following the analysis difficult. I suggest to focus on the recent works using deep learning methods. The authors are encouraged to discuss, for example, TTCNN: A Breast Cancer Detection and Classification towards Computer-Aided Diagnosis Using Digital Mammography in Early Stages; Breast cancer detection using mammogram images with improved multi-fractal dimension approach and feature fusion; and Dilated semantic segmentation for breast ultrasonic lesion detection using parallel feature fusion. A summary of the limitations of previous studies is expected as a motivation for the current paper.
3. What do you mean by feature signals or feature patterns? Present a definition.
4. Present a full list of hyperparameter values (such as batch size) of the deep learning model. How did you select the hyperparameter values for CNN training? Did you use any hyperparameter optimisation/finetuning methods?
5. How specific components of your propose methodology influence the performance? Present the results of the ablation study.
6. Evaluate the computational complexity (rather than execution time which is computer-dependent) of the proposed methodology.
7. Discuss the explainability of the results using for example GRAD-CAM activation maps.
8. Discuss the limitations of your approach such as s small database used for evaluation.
9. Revise and extend the conclusions. Currently, they are too short and do not represent the work done and the outcomes contained in this study. Go beyond the summary of works done. Enlist the specific advantages of your method over similar methods. Use the main numerical results from your experiments to support your claims. What are the implications of your study for further research in the domain of biomedical imaging?
Author Response
For Reviewer: #3
The paper proposes a two-dimension (2D) spatial and one-dimension (1D) convolutional neural network (CNN) for the classification of breast lesions using mammogram images. The proposed approach is evaluated on a benchmark MIAS dataset of mammography images. The paper needs to be revised and improved to address the suggestions and comments presented below, before the paper could be considered for publication.
Response: Thank you for reviewers’ comments. The point-to-point responses to all the referees are shown below.
- The authors need to explicitly state their novelty and differences from the previous studies. What are your contributions to the research field?
Response: Thank you for reminding us. Some sentences have been added in Introduction (Page#4), Section 3.3. (Pages#15-#17), and Conclusion (Page#18).
- The discussion on the related works section lacks of focus. Several works using machine learning and deep learning techniques are discussed, but without any particular order or structure, which makes following the analysis difficult. I suggest to focus on the recent works using deep learning methods. The authors are encouraged to discuss, for example, TTCNN: A Breast Cancer Detection and Classification towards Computer-Aided Diagnosis Using Digital Mammography in Early Stages; Breast cancer detection using mammogram images with improved multi-fractal dimension approach and feature fusion; and Dilated semantic segmentation for breast ultrasonic lesion detection using parallel feature fusion. A summary of the limitations of previous studies is expected as a motivation for the current paper.
Response: Thank you for reminding us. Some sentences have been added in Introduction (Page#4) and Section 3.3. (Pages#15-#17).
Introduction, Page#4
… Hence, we intend to design a 2D spatial and 1D CNN-based classifier to simplify the tasks of image enhancement, feature extraction, and pattern recognition, consisting of a two-dimensional (2D) spatial convolutional layer, a flattening process layer, one-dimensional (1D) convolutional layers, a pooling process layer, and a classification layer, which are integrated into an individual multilayer classifier for breast lesions screening. In 2D spatial convolutional layer, fractional-order-based convolutional, Grad-CAM activation mapping, integral image (II) operations [42-46] can be employed to perform the convolutional processes to detect the desired object’s edge and contour in the specific region along the horizontal and vertical directions. The different feature patterns can be extracted through convolutional operations by using different filtering mask weights and mask size assignments, such as 3 ´ 3, 5 ´ 5, 7 ´ 7, 9 ´ 9, 11 ´ 11, and so on. Hence, theses extracted feature patterns can be used to employ the studies of the mammographic classification and to identify the breast lesions. However, the fractional-order-based masks require selecting suitable fractional-order parameters (v Î (0, 1)) and the sizes of convolutional masks [42, 47-48] to extract different aspects (horizontal, vertical, or diagonal edges) and useful features from the input images. The II process performs the spatial convolutions by using the summed area table (SAT) [44-46] to detect the line and diagonal edge features. The II based convolutional process doesn’t require the respective of the convolutional mask’s parameters and sizes. After 2D spatial convolutional process, via image enhancement (adjust the contrast and maintain the feature), the possible lesion can be easily detected and located in a ROI; and with the specific bounding box, the “feature pattern” can be easily picked out from the original mammographic image. Then, converting a 2D feature pattern into a “1D feature vector” by flattening process, the multi-round 1D convolutional processes subsequently enhance the incoming feature vector, as feature signals, which can also increase the significant characteristics for further feature extraction and classification application. …
Section 3.3., Pages#15-#17
… The performance of the proposed classifier was superior to that of the traditional 2D CNN-based classifier in design cycle, screening accuracy, parameters assignment (including convolutional masks and BPNN’s network parameters), parameters adjustment, computational complexity level (iteration computations), and computational time. The BPNN’s optimal parameters are required to determine by ADAM algorithm in the training stage, which were updated by adjusting the network parameter, decay parameter, learning rate, and attenuation rate to minimize the error rate. Additionally, …
… For example, TTCNN [38] comprised two convolutional layers (with 5 ´ 5 Kernel mask size, 16 and 32 Kernel masks) and followed by the MP layers (with 2 ´ 2 mask size, 16 and 32 MP masks), the 3rd convolutional layer (with 3 ´ 3 Kernel mask size, 64 Kernel masks), and the fully connected layer (classification layer). Hence, these multi convolutional–pooling layers were used to select the DP features for improved image contrast (contrast adjustment), which limited the size of output patterns, and refined the classifier’s recognition ability. However, the model’s purpose and performance were required to continuously maintain the available training dataset, and the excessive multi convolutional–pooling processes would decrease the position, orientation, and spatial relationships of the desired object. ML-based methods …
- What do you mean by feature signals or feature patterns? Present a definition.
Response: Thank you for reminding us. Some sentences have been added in Introduction and Section 2.3., Pages#3 and #7.
Introduction, Page#3
… The II process performs the spatial convolutions by using the summed area table (SAT) [44-46] to detect the line and diagonal edge features. The II based convolutional process doesn’t require the convolutional mask’s parameters and sizes. Therefore, after the 2D spatial convolutional process, via image enhancement (adjust the contrast and maintain the feature), the possible lesion can be easily detected and located in a ROI with the specific bounding box, the “feature pattern” can be easily picked out from the original mammographic image. Then, converting a 2D feature pattern into a “1D feature vector” by the flattening process, the multi-round 1D convolutional processes subsequently enhance the incoming feature vector as feature signals, which can also increase the significant characteristics for further feature extraction and classification application. …
Section 2.3., Page#7
… where Ixy(p, q) are the II values within the ROI, p = 1, 2, 3, …, n, and q = 1, 2, 3, …, n (n = 100 in this study); FLAT (·) is the flattening operator; and FLATIx is the 1D data stream vector as a feature signal. Then, multi-round 1D convolutional operators are used to deal with the incoming feature signal by using the Xc[i] = Xc-1[i] * Hc[j] (symbol “ * ” is the convolutional operator), which can be presented in a discrete-time form [40-41, 52]: …
- Present a full list of hyperparameter values (such as batch size) of the deep learning model. How did you select the hyperparameter values for CNN training? Did you use any hyperparameter optimization / finetuning methods?
Response: Thank you for reminding us. Some sentences have been added in Section 2.4., Section 3.1. and Section 3.3., Pages#10, #15, and #17.
Section 2.4, Page#10
… Hence, we can perform our medical purpose to establish a classifier for automatic multi-label classification, consisting of II based convolutional process in the 1st convolutional layer; two 1D convolutional processes in the 2nd and 3rd convolutional layers (with discrete Gaussian mask, data length = 200, and stride = 1); 1D pooling layer (stride = 100); and GRA-based classifier in classification layer, as seen in Table 2. As seen in Figure 7, the flowchart of the classifier’s testing and validation includes the image enhancement and noise denoizing with II-based spatial convolutional process, feature pattern extraction, flattening process, two-round 1D convolutional process, 1D pooling process, breast lesions screening, and keeping its medical purpose in clinical application. …
Section 3.1, Page#10
… This study would compare the proposed classifier with the traditional 2D CNN-based classifier, including training time, accuracy, and classifier performances. As seen in Table 2, we also established the two multilayer classifiers by using different numbers of convolutional - pooling layers, different types of convolutional masks, and different sizes of convolutional masks. We adopted a 2D spatial convolutional layer, two convolutional-pooling layers, an flattening layer, and a classification layer [42, 53] for established a multilayer classifier. Two 3 ´ 3 fractional-order masks were used to perform the 2D spatial convolutional processes for enhanced the edge information of the possible breast lesions; the fractional-order parameter, v = 0.30 – 0.40, provided promising results for feature enhancement (v = 0.35 was selected in our study [53]). The number of kernel convolutional masks and maximum pooling (MP) masks was set to 16 in 2nd and 3rd convolutional – pooling layers, respectively. The sizes of kernel convolutional masks and maximum pooling masks were set 3 ´ 3 and 2 ´ 2, respectively. Two kernel convolutional–pooling processes were used to extract the desired object’s feature pattern and also reduced the dimensions of the feature patterns with MP process for obtaining abstract features. Each Kernel mask moved the number of columns and rows in steps of 1 (stride = 1) at each convolutional operation. The padding parameter was set to 1 to maintain the feature pattern (padding = 1). Each MP mask moved with a stride of 2 (stride = 2). The MP processes could overcome the overfitting problem for training a multilayer classifier. In the classification layer, for a back-propagation neural network (BPNN) with 1 input layer (625 nodes), 1st hidden layer (168 nodes), 2nd hidden layer (64 nodes), and 1 output layer (3 nodes), an adaptive moment estimation method (ADAM) or a back-propagation algorithm was a gradient-descent based optimization algorithm [53-54] to adjust the BPNN’s connecting weighted parameters which was used to determine the optimal parameters to raise the classifier’s accuracy. …
Section 3.3, Pagess#15 and #17
… We had developed the 2D spatial and 1D CNN-based classifier with mammographic image classification for screened the disease present in normality (Nor) or abnormality (B and M classes). For the MIAS image database [22-23], with the 10-fold cross- validation tests, as seen in Table 4, the experimental results indicated an average precision of 96.70%, average recall of 96.13%, average accuracy of 96.40%, and average F1 score of 0.9641 to quantify the classification performance for identified the breast lesions. The performance of the proposed classifier was superior to that of the traditional 2D CNN-based classifier in design cycle, screening accuracy, parameters assignment (including convolutional masks and BPNN’s network parameters), parameters adjustment, computational complexity level (iteration computations), and computational time. The BPNN’s optimal parameters are required to determine by ADAM algorithm in the training stage, which were updated by adjusting the network parameter, decay parameter, learning rate, and attenuation rate to minimize the error rate. Additionally, the clustering methods, ML methods, and DP based methods were used to carried out different classifier models for clinical / medical purposes, including breast density estimation, mass detection / mass segmentation, mammogram classification / breast lesions screening, and automated breast cancer detection [24-29, 38-39, 60-62], as seen in Table 5. …
… In constract to the ML-based methods, the DL-based methods, such as TTCNN, Grad-CAM CNN, DNN (Deep Neural Network), FCN (Fully Convolutional Network), Attention Dense-Unet, and Dense-Unet models [26-29, 38-39, 61], had more complexity scheme to set up the classifiers with the minimal experter interventions, which used the large volumes of unstructured dataset to train a classifier for classification or detection purposes. For example, TTCNN [38] comprised two convolutional layers (with 5 ´ 5 Kernel mask size, 16 and 32 Kernel masks) and followed by the MP layers (with 2 ´ 2 mask size, 16 and 32 MP masks), the 3rd convolutional layer (with 3 ´ 3 Kernel mask size, 64 Kernel masks), and the fully connected layer (classification layer). Hence, these multi convolutional–pooling layers were used to select the DP features for improved image contrast (contrast adjustment), which limited the size of output patterns, and refined the classifier’s recognition ability. However, the model’s purpose and performance were required to continuously maintain the available training dataset, and the excessive multi convolutional–pooling processes would decrease the position, orientation, and spatial relationships of the desired object. ML-based methods could be rapidly established but might be limited in their results for their applications; and DL-based methods required more time to set up the model but could rapildy produce results; and had promising classification accuracy with the multi convolutional processes. In addition, their models required the resource of GPU hardware to perform the multi convolutional-pooling processes and classifier’s training tasks. Therefore, we had integrated the 2D spatial and 1D CNN based classifier to simplify the multi 2D convolutional processes and computational complexity levels. In classification layer, GRA-based classifier had straightforward mathematic operations without the optimization / fine–tuning algorithms and iteration computations to perform the training and pattern recognition tasks. Some advantages of the proposed classifier were shown below:
- the possible breast lesions’ spatial and edge information clould be enhanced by II based spatial convolutional process in first convolutional layer, which helped to easily locate ROI and extract feature patterns from the original mammographic image;
- the suitable two-round 1D convolutional processes could quantify the different levels, which helped to preliminary separate the Nor from the B and M classes;
- the dimension of feature signals could be reduced by 1D pooling process, which helped to overcome the classifier’s overfitting problems in training stage;
- the straightforward mathematic operations performed the training and pattern recognition tasks;
- the optimal parameters that were updated in the training stage did not require convergence condition assignment and parameters adjustment;
- the determination network parameters did not require the complex iteration computations and optimization algorithms.
- the classification accuracy could be obtained in less computation time and was feasible to replace manual screening with specific expertise and experience.
- How specific components of your propose methodology influence the performance? Present the results of the ablation study.
Response: Thank you for reminding us. Some sentences have been added in Introduction and Section 3.3., Pages#4, #15, and #17.
Introduction, Page#4
… Hence, we intend to design a 2D spatial and 1D CNN-based classifier to simplify the tasks of image enhancement, feature extraction, and pattern recognition, consisting of a two-dimensional (2D) spatial convolutional layer, a flattening process layer, one-dimensional (1D) convolutional layers, a pooling process layer, and a classification layer, which are integrated into an individual multilayer classifier for breast lesions screening. In 2D spatial convolutional layer, fractional-order-based convolutional, Grad-CAM activation mapping, integral image (II) operations [42-46] can be employed to perform the convolutional processes to detect the desired object’s edge and contour in the specific region along the horizontal and vertical directions. The different feature patterns can be extracted through convolutional operations by using different filtering mask weights and mask size assignments, such as 3 ´ 3, 5 ´ 5, 7 ´ 7, 9 ´ 9, 11 ´ 11, and so on. Hence, theses extracted feature patterns can be used to employ the studies of the mammographic classification and to identify the breast lesions. However, the fractional-order-based masks require selecting suitable fractional-order parameters (v Î (0, 1)) and the sizes of convolutional masks [42, 47-48] to extract different aspects (horizontal, vertical, or diagonal edges) and useful features from the input images. The II process performs the spatial convolutions by using the summed area table (SAT) [44-46] to detect the line and diagonal edge features. The II based convolutional process doesn’t require the respective of the convolutional mask’s parameters and sizes. After 2D spatial convolutional process, via image enhancement (adjust the contrast and maintain the feature), the possible lesion can be easily detected and located in a ROI; and with the specific bounding box, the “feature pattern” can be easily picked out from the original mammographic image. Then, converting a 2D feature pattern into a “1D feature vector” by flattening process, the multi-round 1D convolutional processes subsequently enhance the incoming feature vector, as feature signals, which can also increase the significant characteristics for further feature extraction and classification application. …
Section 3.3., Pages#15 and #17
… We had developed the 2D spatial and 1D CNN-based classifier with mammographic image classification for screened the disease present in normality (Nor) or abnormality (B and M classes). For the MIAS image database [22-23], with the 10-fold cross- validation tests, as seen in Table 4, the experimental results indicated an average precision of 96.70%, average recall of 96.13%, average accuracy of 96.40%, and average F1 score of 0.9641 to quantify the classification performance for identified the breast lesions. The performance of the proposed classifier was superior to that of the traditional 2D CNN-based classifier in design cycle, screening accuracy, parameters assignment (including convolutional masks and BPNN’s network parameters), parameters adjustment, computational complexity level (iteration computations), and computational time. The BPNN’s optimal parameters are required to determine by ADAM algorithm in the training stage, which were updated by adjusting the network parameter, decay parameter, learning rate, and attenuation rate to minimize the error rate. Additionally, the clustering methods, ML methods, and DP based methods were used to carried out different classifier models for clinical / medical purposes, including breast density estimation, mass detection / mass segmentation, mammogram classification / breast lesions screening, and automated breast cancer detection [24-29, 38-39, 60-62], as seen in Table 5. …
… In constract to the ML-based methods, the DL-based methods, such as TTCNN, Grad-CAM CNN, DNN (Deep Neural Network), FCN (Fully Convolutional Network), Attention Dense-Unet, and Dense-Unet models [26-29, 38-39, 61], had more complexity scheme to set up the classifiers with the minimal experter interventions, which used the large volumes of unstructured dataset to train a classifier for classification or detection purposes. For example, TTCNN [38] comprised two convolutional layers (with 5 ´ 5 Kernel mask size, 16 and 32 Kernel masks) and followed by the MP layers (with 2 ´ 2 mask size, 16 and 32 MP masks), the 3rd convolutional layer (with 3 ´ 3 Kernel mask size, 64 Kernel masks), and the fully connected layer (classification layer). Hence, these multi convolutional–pooling layers were used to select the DP features for improved image contrast (contrast adjustment), which limited the size of output patterns, and refined the classifier’s recognition ability. However, the model’s purpose and performance were required to continuously maintain the available training dataset, and the excessive multi convolutional–pooling processes would decrease the position, orientation, and spatial relationships of the desired object. ML-based methods could be rapidly established but might be limited in their results for their applications; and DL-based methods required more time to set up the model but could rapildy produce results; and had promising classification accuracy with the multi convolutional processes. In addition, their models required the resource of GPU hardware to perform the multi convolutional-pooling processes and classifier’s training tasks. Therefore, we had integrated the 2D spatial and 1D CNN based classifier to simplify the multi 2D convolutional processes and computational complexity levels. In classification layer, GRA-based classifier had straightforward mathematic operations without the optimization / fine–tuning algorithms and iteration computations to perform the training and pattern recognition tasks. Some advantages of the proposed classifier were shown below:
- the possible breast lesions’ spatial and edge information clould be enhanced by II based spatial convolutional process in first convolutional layer, which helped to easily locate ROI and extract feature patterns from the original mammographic image;
- the suitable two-round 1D convolutional processes could quantify the different levels, which helped to preliminary separate the Nor from the B and M classes;
- the dimension of feature signals could be reduced by 1D pooling process, which helped to overcome the classifier’s overfitting problems in training stage;
- the straightforward mathematic operations performed the training and pattern recognition tasks;
- the optimal parameters that were updated in the training stage did not require convergence condition assignment and parameters adjustment;
- the determination network parameters did not require the complex iteration computations and optimization algorithms.
- the classification accuracy could be obtained in less computation time and was feasible to replace manual screening with specific expertise and experience.
- Evaluate the computational complexity (rather than execution time which is computer-dependent) of the proposed methodology.
Response: Thank you for reminding us. Some sentences have been added in Introduction and Section 3.3., Pages#4, #15, and #17.
Introduction, Page#4
… Hence, we intend to design a 2D spatial and 1D CNN-based classifier to simplify the tasks of image enhancement, feature extraction, and pattern recognition, consisting of a two-dimensional (2D) spatial convolutional layer, a flattening process layer, one-dimensional (1D) convolutional layers, a pooling process layer, and a classification layer, which are integrated into an individual multilayer classifier for breast lesions screening. In 2D spatial convolutional layer, fractional-order-based convolutional, Grad-CAM activation mapping, integral image (II) operations [42-46] can be employed to perform the convolutional processes to detect the desired object’s edge and contour in the specific region along the horizontal and vertical directions. The different feature patterns can be extracted through convolutional operations by using different filtering mask weights and mask size assignments, such as 3 ´ 3, 5 ´ 5, 7 ´ 7, 9 ´ 9, 11 ´ 11, and so on. Hence, theses extracted feature patterns can be used to employ the studies of the mammographic classification and to identify the breast lesions. However, the fractional-order-based masks require selecting suitable fractional-order parameters (v Î (0, 1)) and the sizes of convolutional masks [42, 47-48] to extract different aspects (horizontal, vertical, or diagonal edges) and useful features from the input images. The II process performs the spatial convolutions by using the summed area table (SAT) [44-46] to detect the line and diagonal edge features. The II based convolutional process doesn’t require the respective of the convolutional mask’s parameters and sizes. After 2D spatial convolutional process, via image enhancement (adjust the contrast and maintain the feature), the possible lesion can be easily detected and located in a ROI; and with the specific bounding box, the “feature pattern” can be easily picked out from the original mammographic image. Then, converting a 2D feature pattern into a “1D feature vector” by flattening process, the multi-round 1D convolutional processes subsequently enhance the incoming feature vector, as feature signals, which can also increase the significant characteristics for further feature extraction and classification application. …
Section 3.3., Pages#15 and #17
… We had developed the 2D spatial and 1D CNN-based classifier with mammographic image classification for screened the disease present in normality (Nor) or abnormality (B and M classes). For the MIAS image database [22-23], with the 10-fold cross- validation tests, as seen in Table 4, the experimental results indicated an average precision of 96.70%, average recall of 96.13%, average accuracy of 96.40%, and average F1 score of 0.9641 to quantify the classification performance for identified the breast lesions. The performance of the proposed classifier was superior to that of the traditional 2D CNN-based classifier in design cycle, screening accuracy, parameters assignment (including convolutional masks and BPNN’s network parameters), parameters adjustment, computational complexity level (iteration computations), and computational time. The BPNN’s optimal parameters are required to determine by ADAM algorithm in the training stage, which were updated by adjusting the network parameter, decay parameter, learning rate, and attenuation rate to minimize the error rate. Additionally, the clustering methods, ML methods, and DP based methods were used to carried out different classifier models for clinical / medical purposes, including breast density estimation, mass detection / mass segmentation, mammogram classification / breast lesions screening, and automated breast cancer detection [24-29, 38-39, 60-62], as seen in Table 5. …
… In constract to the ML-based methods, the DL-based methods, such as TTCNN, Grad-CAM CNN, DNN (Deep Neural Network), FCN (Fully Convolutional Network), Attention Dense-Unet, and Dense-Unet models [26-29, 38-39, 61], had more complexity scheme to set up the classifiers with the minimal experter interventions, which used the large volumes of unstructured dataset to train a classifier for classification or detection purposes. For example, TTCNN [38] comprised two convolutional layers (with 5 ´ 5 Kernel mask size, 16 and 32 Kernel masks) and followed by the MP layers (with 2 ´ 2 mask size, 16 and 32 MP masks), the 3rd convolutional layer (with 3 ´ 3 Kernel mask size, 64 Kernel masks), and the fully connected layer (classification layer). Hence, these multi convolutional–pooling layers were used to select the DP features for improved image contrast (contrast adjustment), which limited the size of output patterns, and refined the classifier’s recognition ability. However, the model’s purpose and performance were required to continuously maintain the available training dataset, and the excessive multi convolutional–pooling processes would decrease the position, orientation, and spatial relationships of the desired object. ML-based methods could be rapidly established but might be limited in their results for their applications; and DL-based methods required more time to set up the model but could rapildy produce results; and had promising classification accuracy with the multi convolutional processes. In addition, their models required the resource of GPU hardware to perform the multi convolutional-pooling processes and classifier’s training tasks. Therefore, we had integrated the 2D spatial and 1D CNN based classifier to simplify the multi 2D convolutional processes and computational complexity levels. In classification layer, GRA-based classifier had straightforward mathematic operations without the optimization / fine–tuning algorithms and iteration computations to perform the training and pattern recognition tasks. Some advantages of the proposed classifier were shown below:
- the possible breast lesions’ spatial and edge information clould be enhanced by II based spatial convolutional process in first convolutional layer, which helped to easily locate ROI and extract feature patterns from the original mammographic image;
- the suitable two-round 1D convolutional processes could quantify the different levels, which helped to preliminary separate the Nor from the B and M classes;
- the dimension of feature signals could be reduced by 1D pooling process, which helped to overcome the classifier’s overfitting problems in training stage;
- the straightforward mathematic operations performed the training and pattern recognition tasks;
- the optimal parameters that were updated in the training stage did not require convergence condition assignment and parameters adjustment;
- the determination network parameters did not require the complex iteration computations and optimization algorithms.
- the classification accuracy could be obtained in less computation time and was feasible to replace manual screening with specific expertise and experience.
- Discuss the explainability of the results using for example GRAD-CAM activation maps.
Response: Thank you for reminding us. Some sentences have been added in Introduction, Page#4.
Introduction, Page#4
… The Grad-CAM is also used to enhance the ROI and replace the traditional fully connected layer with global average pooling (GAP) [39]. Then, the feature patterns are obtained through ReLU (Rectified Linear Unit) activation of the summation of multiplication of feature patterns by using the GAP [43]. Hence, the classification accuracy of these multilayer structures can be raised for digital image classification. However, the limitations of the multilayer classifier are determining the number of convolutional - pooling layers, the number of convolution kernels, and the sizes of convolutional masks for setting the structure of convolutional - pooling layers. Too many multiconvolutional pooling processes wil result in the spatial and edge information loss and have no useful for key feature extraction. …
- Yong Joon Suh, Jaewon Jung, and Bum-Joo Cho, “Automated breast cancer detection in digital mammograms of various densities via deep learning, ”Journal of Presonalized Medicine, vol. 10, no. 211, 2020, pp. 1-11.
- R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization, arXiv 2016, arXiv:1610.02391.
- Discuss the limitations of your approach such as s small database used for evaluation.
Response: Thank you for reminding us. Some sentences have been added in Section 3.2., Page#14.
Section 3.2., Page#14
… Foe a small scaled databases, the cross-validation method was used in ML and DL for improving model’s classification performances when we didn’t have enough dataset to split the training, validation and testing; through 10 - fold (Kf = 10) cross-validation tests, for each fold tests, randomly selected 200 feature patterns from dataset for trained the both classifiers, and other 200 feature patterns for validated the classifier’s performane. The experimental results of 2D fractional-order CNN-based classifier were shown in Table 3, with an average precision of 95.90% (as the positive predictive value, PPV) and an average recall of 96.10% for identified the feature patterns for tumor cases (B and M) and also accurately identified the abnormality (TP), respectively; an average accuracy of 96.00% for correctly identified the tumor-free feature patterns and tumor feature patterns; and an average F1 score of 0.9599 for evaluated the classifier’s performance for accuractly separated the normality from abnormalitity. …
- Revise and extend the conclusions. Currently, they are too short and do not represent the work done and the outcomes contained in this study. Go beyond the summary of works done. Enlist the specific advantages of your method over similar methods. Use the main numerical results from your experiments to support your claims. What are the implications of your study for further research in the domain of biomedical imaging?
Response: Thank you for reminding us. Some sentences have been added in Conclusion, Pages#17-#18.
Conclusion, Pages#17-#18
… Generally, the routine imaging examinations could be conducted to early detect the breast lesions for increased survival rates, including breast mammography, breast CT, breast ultrasound, and breast magnetic resonance imaging. Among these imaging examinations, breast mammography and breast ultrasound were first-line manners; and breast ultrasound had a poor screening capacity for small calcifications detection and was required to combine with the breast mammography examination to evaluate the suspected breast lesions. Based on breast mammography classification, the proposed 2D spatial and 1D CNN-based classifier could directly use to screen the breast lesions in clinical applications, due to some advantages: spatial convolutional process for enhanced the breast lesions’ features, two-round 1D convolutional processes for identifying the differences between normal and B/M classes, and straightforward mathematic operations for performing the training and screening tasks. Through 10-fold cross-validation tests, we obtained the promising results for screening breast lesions, as a high classifier’s mean F1 score (0.9641), precision (96.70%), and recall (96.13%) for separating the Nor from the B and M classes. The proposed screening model has overcome the limitations, such as parameters assignment, parameters adjustment, iteration computation, and optimization algorithm requirement. Its training scheme has adaptive capability to retrain the classifier with the new image dataset in less computation time, such as clinical images, DDSM database, or INbreast database, which can continuously maintain its intended medical purpose and can also establish a pattern recognition scheme as software in a medical device (SaMD) tool. Therefore, we suggest the proposed screening model could replace the manual inspection manner and traditional CNN methods for specific tasks requiring expertise and experience in medical image examinations. …
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Conclusión: the proposed Method can not replace manual inspection methods.
Conclusion is not easy to understand. It needs english editing. Also, It must be rewritten.
Figure 1 is not clear. Caption must explain in detail.
Table 2 must be improved. Caption should explain it better.
Authors must explain the limitations of their investigation. They should compare with similar methods, including commercial solutions.
Do they show any improvement over what is already know ?
Author Response
For Reviewer: #1
Comments and Suggestions for Authors:
Response: Thank you for reviewers’ comments. The point-to-point responses to all the referees are shown below.
- Conclusión: the proposed Method can not replace manual inspection methods. Conclusion is not easy to understand. It needs english editing. Also, It must be rewritten.
Response: Thank you for reviewers’ comments. Conclusion section has been modified and rewritten, Page#18.
Conclusion, Pages#17 and #18
… The routine imaging examinations could be used to early detect the breast lesions for increased survival rates and then could help save lives, such as mammographic image and breast ultrasound imaging. Mammography and breast ultrasound were both first-line manners to perform clinical examinations. However, the breast ultrasound imaging had a poor screening capacity for small calcifications detection (as he earliest signs of breast cancer) and was required to combine with the diagnostic mammography to evaluate the suspected breast lesions and changes in the breast tissues. A breast ultrasound was an assistive tool for screening breast cancer and offered a visual guide for performing a biopsy. For a follow up screening, with the low-dose X-rays to view the breast tissue, while an abnormal screening mammogram was obtained, the clinicians or radiologists could capture more images for inspected suspicious lesions, such as calcifications or small tumors as the earliest signs of breast cancer. Hence, based on mammographic image classification, the proposed 2D spatial and 1D CNN-based classifier could directly use to screen the breast lesions in clinical applications. In contrast to the ML- and DL-based methods [24-29, 38-39, 60-62], the proposed multilayer classifier had some advantages: (1) spatial convolutional process for enhanced the breast lesions’ features; (2) two-round 1D convolutional processes for identifying the differences between normal and B/M classes; and (3) straightforward mathematic operations for performing the training and screening tasks. Through 10-fold cross-validation tests, we obtained the promising results for screening breast lesions, as a high classifier’s mean F1 score (0.9641), precision (96.70%), and recall (96.13%) for separating the Nor from the B and M classes.
However, with mammographic image, women had background tissue type (especial in Asians), such as higher breast density, which might affect the classification accuracy in mammographic images [39]. As lesions might be shadowed by the dense tissues, such as dense breast or intermediate mixed-type breast density, AI-based methods might not identified accurately at early stage, thus increased patients’ risk of developing breast cancer. The proposed screening model has overcome the limitations, such as parameters assignment, parameters adjustment, iteration computation, and optimization algorithm requirement. Its training scheme has adaptive capability to retrain the classifier with the new image dataset in less computation time, such as clinical images, DDSM database, or INbreast database. Hence, new / special mammographic images were continuously considered to add to the training datasets which could rapidly retrain the classifier and maintain its intended medical purpose. Its pattern recognition scheme could be carried out as a computer-aided decision-making tool or a software in a medical device (SaMD) tool [65-66]. Therefore, we suggest the proposed automatic screening model could replace the traditional CNN methods for specific tasks requiring expertise and experience in medical image examinations, such as diagnostic mammogram, CT, and MRI, which helped to reduce the burden and to focus on follow-up decision making and medical strategies. …
- Figure 1 is not clear. Caption must explain in detail.
Response: Thank you for reviewers’ comments. Some sentences have been added in Introduction, Page#2.
Introduction, Page#2
… Hence, some studies [6, 9-12] extract feature patterns with the specific bounding box from suspicious mammographic images, as some templates of feature patterns, including normal (Nor), benign (B), and malignant (M), as shown in Figure 1. It uses morphological features to identify the normality or abnormality for automatic breast tumor screening. …
… Figure 1. Templates of feature patterns are extracted from suspicious mammographic images with the specific bounding box. (a) Template patterns for normal, (b) Template patterns for benign (B), and (c)
Template patterns for malignant (M). …
- Table 2 must be improved. Caption should explain it better.
Response: Thank you for reviewers’ comments. Table 2 has been modified and some sentences have been added in Section 2.4., Pages#10 and #11.
Section 2.4., Pages#10 and #11
… Hence, we can perform our medical purpose to establish a classifier for automatic multi-label classification, consisting of II based convolutional process in the 1st convolutional layer for image enhancement; two 1D convolutional processes in the 2nd and 3rd convolutional layers (with discrete Gaussian mask, data length = 200, and stride = 1); 1D pooling layer (stride = 100) for feature extraction; and GRA-based classifier in classification layer, as seen the summary of proposed model in Table 2. …
… Table 2. . Summary of models (Layer Functions, Manners, and Feature Patterns) for the proposed classifier and traditional CNN-based classifier. …
- Authors must explain the limitations of their investigation. They should compare with similar methods, including commercial solutions. Do they show any improvement over what is already know ?
Response: Thank you for reviewers’ comments. Section 3.3. and Conclusion section has been modified and rewritten, Pages#15, #17 and #18.
Section 3.3., Pages#15 and #17
… Additionally, the classification methods, ML- and DL-based methods were both used to carried out different classifier models for clinical / medical purposes, including breast density estimation, mass detection / mass segmentation, mammogram classification / breast lesions screening, and automated breast cancer detection [24-29, 38-39, 60-62], as seen in Table 5. ML based on low-level images features, such as shapes, texture, and local key-point features [24-25, 27, 60, 62], the supervised ML-based models, such as SVM, ANN, and clustering methods [24-25, 60], were used to establish various computer-aided vision classifiers. With the MIAS database, SVM and ANN methods had accuracy rates of 94% and 97.08% for mammogram classification and mass detection, respectively. Clustering methods, such as K-means, Fuzzy C-means, and GA-based feature selection algorithms [27, 62], had accuracy rates of 91.18%, 94.12%, and 84.5% for mass segmentation and mammogram classification, respectively. However, the SVM and ANN required the manual labeled classes and the selected feature patterns to train the classifier, which also required the ongoing human participation and experter intervention to fed new training dataset and continuously to model the purposed tasks. Hence, its model needed more dataset to fed the classifier and to comfirm the accurate classification or correct response through the designers. In clinical application, over time, its model was able to handle the new dataset to retrain classifier, resulting in not efficient to keep the classifi er’s performance and not easy to make classifier’s adjustments on real-time application. Clustering methods (CM) [27, 62] were unsupervised learning to help the classifier’s complex tasks to deal with large, highly flexible, and unpredicated / unlabeled datasets. However, the CM has no critical standard to evaluate the value of its results for not understand which the classifyer’s finding were accurate or useful.
In contrast to the ML-based methods, the DL-based methods, such as TTCNN, Grad-CAM CNN, DNN (Deep Neural Network), FCN (Fully Convolutional Network), Attention Dense-Unet, and Dense-Unet models [26-29, 38-39, 61], had more complexity scheme to set up the classifiers with the minimal experter interventions, which used the large volumes of unstructured dataset to train a classifier for classification or detection purposes. For example, TTCNN [38] comprised two convolutional layers (with 5 ´ 5 Kernel mask size, 16 and 32 Kernel masks) and followed by the MP layers (with 2 ´ 2 mask size, 16 and 32 MP masks), the 3rd convolutional layer (with 3 ´ 3 Kernel mask size, 64 Kernel masks), and the fully connected layer (classification layer). With DDSM [19, 63], INbreast [64], and MIAS [22-23] database, the TTCNN had accuracy rates of 99.08%, 96.82%, and 96.57% for breast cancer diagnosis and classification, respectively. The Grad-CAM based CNN, including DenseNet-169 and EfficientNet-B5 [39], could detect malignant lesions in both craniocaudal and mediolateral oblique view images, which highlighted ROI with the red color-coded areas to indicate the positive region for identified the suspicious lesions. This visualization manner could locate and identify the abnormalities from mammograms in case of mass or calcification. DenseNet-169 and EfficientNet-B5 had mean accuracy rates of 88.1% and 87.9% for automated breast cancer detection, respectively. Thus, these multi convolutional – pooling layers were used to select the 2D features for improved image contrast (contrast adjustment), which limited the size of output patterns, and refined the classifier’s recognition ability. However, the model’s purpose and performance were required to continuously maintain the available training dataset, and the excessive multi convolutional–pooling processes would decrease the position, orientation, and spatial relationships of the desired object. …
Conclusion, Page#18
… The routine imaging examinations could be used to early detect the breast lesions for increased survival rates and then could help save lives, such as mammographic image and breast ultrasound imaging. Mammography and breast ultrasound were both first-line manners to perform clinical examinations. However, the breast ultrasound imaging had a poor screening capacity for small calcifications detection (as he earliest signs of breast cancer) and was required to combine with the diagnostic mammography to evaluate the suspected breast lesions and changes in the breast tissues. A breast ultrasound was an assistive tool for screening breast cancer and offered a visual guide for performing a biopsy. For a follow up screening, with the low-dose X-rays to view the breast tissue, while an abnormal screening mammogram was obtained, the clinicians or radiologists could capture more images for inspected suspicious lesions, such as calcifications or small tumors as the earliest signs of breast cancer. Hence, based on mammographic image classification, the proposed 2D spatial and 1D CNN-based classifier could directly use to screen the breast lesions in clinical applications. In contrast to the ML- and DL-based methods [24-29, 38-39, 60-62], the proposed multilayer classifier had some advantages: (1) spatial convolutional process for enhanced the breast lesions’ features; (2) two-round 1D convolutional processes for identifying the differences between normal and B/M classes; and (3) straightforward mathematic operations for performing the training and screening tasks. Through 10-fold cross-validation tests, we obtained the promising results for screening breast lesions, as a high classifier’s mean F1 score (0.9641), precision (96.70%), and recall (96.13%) for separating the Nor from the B and M classes.
However, with mammographic image, women had background tissue type (especial in Asians), such as higher breast density, which might affect the classification accuracy in mammographic images [39]. As lesions might be shadowed by the dense tissues, such as dense breast or intermediate mixed-type breast density, AI-based methods might not identified accurately at early stage, thus increased patients’ risk of developing breast cancer. The proposed screening model has overcome the limitations, such as parameters assignment, parameters adjustment, iteration computation, and optimization algorithm requirement. Its training scheme has adaptive capability to retrain the classifier with the new image dataset in less computation time, such as clinical images, DDSM database, or INbreast database. Hence, new / special mammographic images were continuously considered to add to the training datasets which could rapidly retrain the classifier and maintain its intended medical purpose. Its pattern recognition scheme could be carried out as a computer-aided decision-making tool or a software in a medical device (SaMD) tool [65-66]. Therefore, we suggest the proposed automatic screening model could replace the traditional CNN methods for specific tasks requiring expertise and experience in medical image examinations, such as diagnostic mammogram, CT, and MRI, which helped to reduce the burden and to focus on follow-up decision making and medical strategies. …
Author Response File: Author Response.pdf
Reviewer 2 Report
Most of my concerns were addressed and the paper was substantially improved.
Thus, I recommend to publication.
Author Response
For Reviewer: #2
Comments and Suggestions for Authors:
Most of my concerns were addressed and the paper was substantially improved. Thus, I recommend to publication.
Response: Thank you for reviewers’ comments.
Reviewer 3 Report
The manuscript was revised and improved. Several issues, however, remain:
- Discuss the explainability of the results using for example GRAD-CAM activation maps. (the issue was not addressed). Present your activation maps and discuss.
- Discuss the relevance and implications of your methodology within the current body of state-of-the-art in the relevant research field vs other works, see Systematic Review of Computing Approaches for Breast Cancer Detection Based Computer Aided Diagnosis Using Mammogram Images.
- Check the correctness of all references to tables and figures, esp. Table #1, line 388.
- Check language, there are many mistakes and typos such as "shoewd"
Author Response
For Reviewer: #3
Comments and Suggestions for Authors:
The manuscript was revised and improved. Several issues, however, remain:
Response: Thank you for reviewers’ comments. The point-to-point responses to all the referees are shown below.
- Discuss the explainability of the results using for example GRAD-CAM activation maps. (the issue was not addressed). Present your activation maps and discuss.
Response: Thank you for reviewers’ comments. Some sentences have been added in Section 3.3., Page#17.
Section 3.3., Page#17
… In contrast to the ML-based methods, the DL-based methods, such as TTCNN, Grad-CAM CNN, DNN (Deep Neural Network), FCN (Fully Convolutional Network), Attention Dense-Unet, and Dense-Unet models [26-29, 38-39, 61], had more complexity scheme to set up the classifiers with the minimal experter interventions, which used the large volumes of unstructured dataset to train a classifier for classification or detection purposes. For example, TTCNN [38] comprised two convolutional layers (with 5 ´ 5 Kernel mask size, 16 and 32 Kernel masks) and followed by the MP layers (with 2 ´ 2 mask size, 16 and 32 MP masks), the 3rd convolutional layer (with 3 ´ 3 Kernel mask size, 64 Kernel masks), and the fully connected layer (classification layer). With DDSM [19, 63], INbreast [64], and MIAS [22-23] database, the TTCNN had accuracy rates of 99.08%, 96.82%, and 96.57% for breast cancer diagnosis and classification, respectively. The Grad-CAM based CNN, including DenseNet-169 and EfficientNet-B5 [39], could detect malignant lesions in both craniocaudal and mediolateral oblique view images, which highlighted ROI with the red color-coded areas to indicate the positive region for identified the suspicious lesions. This visualization manner could locate and identify the abnormalities from mammograms in case of mass or calcification. DenseNet-169 and EfficientNet-B5 had mean accuracy rates of 88.1% and 87.9% for automated breast cancer detection, respectively. Thus, these multi convolutional – pooling layers were used to select the 2D features for improved image contrast (contrast adjustment), which limited the size of output patterns, and refined the classifier’s recognition ability. However, the model’s purpose and performance were required to continuously maintain the available training dataset, and the excessive multi convolutional–pooling processes would decrease the position, orientation, and spatial relationships of the desired object. …
- Discuss the relevance and implications of your methodology within the current body of state-of-the-art in the relevant research field vs other works, see Systematic Review of Computing Approaches for Breast Cancer Detection Based Computer Aided Diagnosis Using Mammogram Images.
Response: Thank you for reviewers’ comments. Conclusion section has been modified and rewritten, Pages#17 and #18.
Conclusion, Pages#17 and #18
… The routine imaging examinations could be used to early detect the breast lesions for increased survival rates and then could help save lives, such as mammographic image and breast ultrasound imaging. Mammography and breast ultrasound were both first-line manners to perform clinical examinations. However, the breast ultrasound imaging had a poor screening capacity for small calcifications detection (as he earliest signs of breast cancer) and was required to combine with the diagnostic mammography to evaluate the suspected breast lesions and changes in the breast tissues. A breast ultrasound was an assistive tool for screening breast cancer and offered a visual guide for performing a biopsy. For a follow up screening, with the low-dose X-rays to view the breast tissue, while an abnormal screening mammogram was obtained, the clinicians or radiologists could capture more images for inspected suspicious lesions, such as calcifications or small tumors as the earliest signs of breast cancer. Hence, based on mammographic image classification, the proposed 2D spatial and 1D CNN-based classifier could directly use to screen the breast lesions in clinical applications. In contrast to the ML- and DL-based methods [24-29, 38-39, 60-62], the proposed multilayer classifier had some advantages: (1) spatial convolutional process for enhanced the breast lesions’ features; (2) two-round 1D convolutional processes for identifying the differences between normal and B/M classes; and (3) straightforward mathematic operations for performing the training and screening tasks. Through 10-fold cross-validation tests, we obtained the promising results for screening breast lesions, as a high classifier’s mean F1 score (0.9641), precision (96.70%), and recall (96.13%) for separating the Nor from the B and M classes.
However, with mammographic image, women had background tissue type (especial in Asians), such as higher breast density, which might affect the classification accuracy in mammographic images [39]. As lesions might be shadowed by the dense tissues, such as dense breast or intermediate mixed-type breast density, AI-based methods might not identified accurately at early stage, thus increased patients’ risk of developing breast cancer. The proposed screening model has overcome the limitations, such as parameters assignment, parameters adjustment, iteration computation, and optimization algorithm requirement. Its training scheme has adaptive capability to retrain the classifier with the new image dataset in less computation time, such as clinical images, DDSM database, or INbreast database. Hence, new / special mammographic images were continuously considered to add to the training datasets which could rapidly retrain the classifier and maintain its intended medical purpose. Its pattern recognition scheme could be carried out as a computer-aided decision-making tool or a software in a medical device (SaMD) tool [65-66]. Therefore, we suggest the proposed automatic screening model could replace the traditional CNN methods for specific tasks requiring expertise and experience in medical image examinations, such as diagnostic mammogram, CT, and MRI, which helped to reduce the burden and to focus on follow-up decision making and medical strategies. …
- Check the correctness of all references to tables and figures, esp. Table #1, line 388.
Response: Thank you for reminding us. Correct as suggestion.
- Check language, there are many mistakes and typos such as "shoewd"
Response: Thank you for reminding us. Correct as suggestion.
Author Response File: Author Response.pdf