Deep Learning Architecture Improvement Based on Dynamic Pruning and Layer Fusion
Round 1
Reviewer 1 Report
The manuscript deals with a very interesting topic of dynamic pruning and layer fusion using deep learning. In the first parts of the manuscript, the authors define in detail similar works. Then describe their novel approach to pruning and layer fusion (followed by expriments and results).
Authors proposed an improved CNN arcitecture for redundant model opimosation. The manuscript is very clearly written, all results are proved by experiments.
I have no questions and remarks and I recommend to accept the manuscript after a proof-reading.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
The main idea of the paper that layer and channel based pruning can be connected is not entirely new, but the idea presented in the paper, that the residual connections (which are common know in current models) can be exploited and used is a nice novel idea and can be interesting for the community.
The idea is great and the description of the method is clear and understandable.
Unfortunately the results do not back up the efficiency of the algorithm. The main problem is that the authors has only investigated a single dataset (CIFAR) ,which contains extremely low resolution images (32x32).
In training these images are typically upscaled to match the input dimensions of the models (which is 224x224) (it is nto explained in the paper whether the images were upscaledd or not, but I assume this was the case.). Because of this networks trained on this dataset can be heavily pruned and does not always reflect real circumstances, since the data is heavily redundant.
The presented results are okay, but the authors should present results on other commonly cited datasets, such as ImageNet or MS-Coco. Since pretrained models are available on these datasets, the experiments should have the same compelxity.
Also the authors speak about pruning in general, but they only present results on classification tasks. It is a common observation that pruning in other task, e.g. semantic and/or instance segmentation is less efficient than in classification problems. Because of this it would be great if the authors would present results in segmentation and detection problems as well.
Also CIFAR is a fairly simple problem (compared to ImageNet) and contains redundant information in this resolution. Based on these I suspect that the investigated architectures (DenseNets and Resnets) are 'too complex' for these problems. By this I mean that they performance increases only slightly compared to a simple 5 layered convolutional network.
Based on this i do not thing that measuring the accuracy drop on CIFAR is an unbalanced measurements, since the data is redundant and the architecture is complex. I think it would be more fair to compare each datapoint with a model of similar complexity. If we remove some layers and channels, we should get this architecture and retrain the model from scratch and compare this model to the pruned version. It the pruned version is more accurate it shows that the model managed to keep features of the more complex model.
The language of the paper is okay, but contains a few typos and strange expressions ("less accuracy loss"-> lower accuracy loss; Figure 2 does not mention which data is plotted on it, I assume it is CIFAR, but it is unclear and the data is only mentioned din section 4; It is unclear how the parameter pu-pl is defined and why it is 0.03 - it could be empirical, but is should be described.)
Altogether the main idea of the paper is interesting, but the paper should be slightly modified and rewritten and additional experiments are needed to prove the validity of the idea.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
This work proposes an architecture improvement approach for Convolution Neural Networks to optimize the model at the channel-level and the layer-level. Firstly, the model was compressed by dynamic pruning and then the channel-level compressed model was further optimized by layer fusion, the redundant structure removed and other layers substituted its function.
The paper is well written, but a paragraph with application examples of neural networks is missing. I will consider the work for a publication only if a major review will be done.
Minor comments
Not all acronyms are spliced. Report either an initial table with all the explanations acronyms or always specify them before writing.
Generally, figure’s captions are not very detailed and explanatory.
Do not use the same words as the title in the abstract.
Major comments
Introduction
In the first lines of the introduction the conditions in which CNNs are used are reported. However, no concrete examples are given. To better contextualize the problem, insert a paragraph with application examples of neural networks.
A workflow of the convolutional neural network architecture could help the readers in better understanding the problem.
YOLOs have also been used for similar purposes. Give some examples.
There is no real aim. Insert it to make the activity more understandable.
Experimental
Organize the structure inserting summary tables of the main results obtained.
Conclusions
This section is not well organized.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 4 Report
The authors provide a way to improve the performance of deep learning by introducing pruning and layer fusion. Several comments are given as follows.
1. The authors provide the algorithm to reduce the redundancy present in the deep learning structure.
2. The algorithms of pruning and layer fusion seem to provide an efficient way to remove redundancy.
3. The proposed algorithms only decrease the accuracy by 0.72% with the training data and the specified deep learning. However, the proposed algorithms need more computational resources to prune and fusion layers in deep learning. What is the extra cost of the proposed algorithms in deep learning? Will the extra cost significantly slow down the training process by introducing deep learning? The authors are supposed to give more explanations.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
The paper have improved significantly. The newly added experiment on the reduced ImageNet set are great and also underline the applicability and advantage of the method.
However I still have two concerns (one minor):
- I understand that the authors did not have time to make experiments in segmentation tasks (although there are pretrained networks here as well -like deeplab or unet on cityscapes) and the classification results might be sufficient, but they talk about classification and detection and the introduction and they should add an explaining sentence that they have investigated classification only and the efficiency of pruning can be different in detection and segmentation tasks.
- my major concern is about CIFAR-10 and the selected architecture ResNEt-56. Resnet-56 expects an input image of 224x224 pixels and the spatial dimensions of the image are decreased layer-by-layer and in the end it is squezeed to a one-dimensional vector. This might depend on the actual implementation (and in pytorch smaller inputs can be used as well without errors), but a traditional ResNEt-56 can not be used with a 32x32 input.
(Please check this image of the spatial dimensions of REsnet34 (I could not find it for ResNet56, but it is basically the same): https://miro.medium.com/max/1400/1*Y-u7dH4WC-dXyn9jOG4w0w.png
If an input image of 32x32 is used there will be 512x1 neuron in LAyer 4 instead of 512x7)
In this case the number of neurons in the fully connected layer should be rescaled or the same activations are used multiple times which still causes redundancy. This should be checked and described in the paper, because the ResNEt architecture is designed
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
The manuscript in the present form can be published in Electronics as the authors have been improved it. Parts that better contextualize the work have been added. Finally, the information in the figure and table captions are now expanded as suggested.
I don’t need to review another version because I accept the work in the present form.
Author Response
Response to Comments of Reviewer #3:
Thank you very much for your recognition of our work.