Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A High-Density Crowd Counting Method Based on Convolutional Feature Fusion

Appl. Sci. 2018, 8(12), 2367; https://doi.org/10.3390/app8122367

by Hongling Luo^1,2, Jun Sang^1,2,*

, Weiqun Wu^1,2, Hong Xiang^1,2, Zhili Xiang^1,2, Qian Zhang^1,2 and Zhongyuan Wu^1,2

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2018, 8(12), 2367; https://doi.org/10.3390/app8122367

Submission received: 20 October 2018 / Revised: 14 November 2018 / Accepted: 20 November 2018 / Published: 23 November 2018

Round 1

Reviewer 1 Report

The paper presents an interesting topic. Author might consider the following suggestions prior to publication in Applied Sciences Journal:

1. The content of the Introduction need to increase. Please add a specific motivation why the crowd counting and its density estimation are hot topics.

2. Author can increase a number of cited papers in Introduction to inform the reader about the state-of-the-art of the research topic.

3. Probably Section 2 and Section 1 can be merged.

4. Please double check all Equations, the font size of Equations are larger than the fonts in the text. Please use proper formula editor.

5. The resolution of Figs. 4 and 5 need to be enhanced. Reviewer suggests to used solid line and dashed line instead of different color. If reader print the paper out in black and white color, the reader still can see the Figures.

6. It would be better if all figures in Figure 6 are presented in one page instead of separate page. The Authors can also used sub-figure caption such as (a), (b), (c), and (d) to explain more detail for each category.

7. As reviewer understand, there are two types of Deep Learning methods: (1) FNN and (2) CNN. Could Authors provide a brief scientific reason why CNN is selected.

Author Response

Please refer to the uploaded file.

Author Response File: Author Response.pdf

Reviewer 2 Report

A High-density Crowd Counting Method Based on Convolutional Feature Fusion

The authors claim to have proposed a method to fuse low and high-dimensional CNN features toward crowd counting. They have followed the methods suggested by Zhang et al. (14) in computing the ground truth except that they have attempted varying values for β toward obtaining better results. There is a lack of novelty in the proposed work and is incremental concerned with the topic under discussion. Any claims for novelty thus shall be removed.

The authors have worked with the Part A and B of ShanghaiTech dataset. However, the literature survey is not adequate and the results obtained are not state-of-the-art. Few studies mentioned below have already obtained better results than that demonstrated herewith:

Tang et al. Low-Rank and Sparse Based Deep-Fusion Convolutional Neural Network for Crowd Counting. Mathematical Problems in Engineering 2017:1-11, September 2017, DOI: 10.1155/2017/5046727

Han et al. Image Crowd Counting Using Convolutional Neural Network and Markov Random Field. JACIII Vol.21 No.4 2017 pp. 632-638, DOI: 10.20965/jaciii.2017.p0632 (2017)

The aforementioned studies and the rest of the literature have worked not only with the ShanghaiTech dataset but also reported the performance on other datasets including WorldExpo, USCD, and Pest. In this regard, there is a lack of description pertaining to the generalization of the proposed method with the other datasets under prevalence. The authors shall justify the performance of the proposed method on other datasets and perform a comparative study.

How do the authors justify the selection of a VGG16 model for the proposed task? Had they attempted a simple custom CNN as a baseline? What was the effect of using the other pretrained models toward the current task?

How do the authors settle for the optimal model architecture and hyperparameters for the dataset under study? Did they attempt any optimization strategy like Bayesian learning or perform a grid search?

Did the authors measure the variation in the loss with the number of training and test images under Part A and B? How do they attribute the performance of the proposed method to the size of the training data?

Is there an obvious pattern observed in the variation of β and the corresponding MSE and MAE values for the dataset under study?

Authors shall mention the training epochs and computational time and perform a comparative study.

The manuscript has to be proofread by a native English speaker. There are typos and grammatical errors throughout. A few are mentioned herewith:

Abstract: Line 16, 17, page 3: Line 114; page 4: Line 134, 135; page 5: Line 161 and 162, and so on.

Words need to be abbreviated in the first instance and then used throughout.

References need to be properly cited within the text.

Author Response

Please refer to the uploaded file.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Dear Authors,

Thank you for providing the revised version. The paper is much better after the amendments.

Kind regards,

- Reviewer -

Reviewer 2 Report

The authors have answered this reviewer's queries to satisfaction

Article Menu

A High-Density Crowd Counting Method Based on Convolutional Feature Fusion

Further Information

Guidelines

MDPI Initiatives

Follow MDPI