Next Article in Journal
Experimentation of Multi-Input Single-Output Z-Source Isolated DC–DC Converter-Fed Grid-Connected Inverter with Sliding Mode Controller
Next Article in Special Issue
Improved Machine Learning Model for Urban Tunnel Settlement Prediction Using Sparse Data
Previous Article in Journal
Analyzing the EU ETS, Challenges and Opportunities for Reducing Greenhouse Gas Emissions from the Aviation Industry in Europe
Previous Article in Special Issue
Riding the Waves of Artificial Intelligence in Advancing Accounting and Its Implications for Sustainable Development Goals
 
 
Article
Peer-Review Record

Fusion of Deep Sort and Yolov5 for Effective Vehicle Detection and Tracking Scheme in Real-Time Traffic Management Sustainable System

Sustainability 2023, 15(24), 16869; https://doi.org/10.3390/su152416869
by Sunil Kumar 1, Sushil Kumar Singh 2,*, Sudeep Varshney 3, Saurabh Singh 4, Prashant Kumar 5, Bong-Gyu Kim 6 and In-Ho Ra 7,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4:
Sustainability 2023, 15(24), 16869; https://doi.org/10.3390/su152416869
Submission received: 28 October 2023 / Revised: 20 November 2023 / Accepted: 4 December 2023 / Published: 15 December 2023

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

(1) Real-time speed is an important parameter of vehicle detection. You should compare the running speed of your algorithm with that of other algorithms.

 (2) Vehicles those are far away from the camera are very small, how well does the algorithm detect these vehicles, and how to avoid assigning an ID to the wrong vehicle target or eliminate the wrong ID. 

(3) Whether the pictures selected from the datasets can evaluate various complex application scenarios, such as different vehicle types, lighting conditions, weather conditions, and occlusion situations ?

Author Response

Thank you for the comments. Please check the uploaded rebuttal.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Overall, I found this manuscript to be well-written and can publish as it is.

Comments for author File: Comments.pdf

Author Response

Thank you for the comments. Please check the uploaded rebuttal.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This paper presents vehicle detection and tracking methods using YOLOv5 and Deep SORT to achieve high accuracy and better performance in terms of precision, recall and mAP than existing algorithms in real-time. Deep SORT is integrated with YOLOv5 to reduce the false detection and missed detections due to occlusions, illumination, counting unique ids for tracking vehicles and other external factors. To evaluate the effectiveness and performance of the proposed method simulations were conducted on the BDD100K and PASCAL datasets. This paper may contain some publishable ideas. But it's also necessary to make some major modifications. Some detailed comments are listed as follows, which will help to improve the quality of the manuscript.

1. The first paragraph of section 3.4 "The fig. 2, 3, and 4 already show the proposed…". Fig. 4 is not the proposed scheme of vehicle detection and vehicle tracking. Fix it.

2. How is the computational efficiency of the proposed method? It is recommended to add data for efficiency comparison.

3. The Conclusion Section needs a major revision, and it is recommended that the authors add a summary of what new discoveries have been made in the field through this study, and indicate what possible results these discoveries are likely to produce.

4. Some mathematical notations and presentations are not rigorous/clear enough to correctly understand the contents of the paper. It is suggested to check all the definitions of variables and redefine the missing information when preparing the submission of the paper.

5. In Introduction, some recent related work on machine learning prediction model should be strengthened, several recent investigations are recommended for the author’s references, i.e., " 10.1007/s10462-021-10061-9 "; "10.1016/j.ast.2023.108325".

 

6. A large number of investigations have studied the machine learning by combining many algorithms respectively, for instance, swarm intelligence algorithm, gradient algorithm and so on. Some articles are given below but not limited to these. Relative to these methods, what is the strengths of the proposed algorithm. i.e., " 10.1080/24751839.2020.1833137 "; " 10.1016/j.ijfatigue.2022.106812".

Author Response

Thank you for the comments. Please check the uploaded rebuttal.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

 The article is interesting, important and should be published. However, it needs some work to make it more readable. As it is , it is quite repetitive (extolling the vitues of YOLO 5 and  SORT/Deep SORT in too many instances), confusing  and making it hard to understand the specifics. It addition it uses acronyms without defining them.

With regard to the method used, I would like to see, perhaps in an Appendix more details on the implementation. For instance YOLO5 is used for the detection, without giving a single example of how this is done, e.g. in Frame 1 there is a vehicle, labelled vehicle 1 with the characteristics x1,x2,x3,… in box described by coordinates….. Then ‘data association’ (lines 447-448) between different frames is done as follows: Explain via an example how a vehicle identified in frame 1 is associated with another one from frame 2. Or what happens it is is not in frame 2. Or if it is in frame 2 and not in frame 1.  Such information would make the manuscript much more readable in my view.

In more detail:

 

1.     Introduction:

This is an important, but confusing section and lacks focus.

The section starts with a sentence on the difficulty of monitoring EACH vehicle. It is by no means clear what ‘monitoring ‘mean and  if one talks about traffic management in the title, it is unclear why you need to monitor each vehicle rather than a flow (this is akin to the difference of fluid mechanics vs  tracking the motion of each individual fluid molecule). So I think a sentence of what exactly the interest is: Avoiding congestion? For this monitoring a ‘front’ or  fluid velocity is typically enough and the main management tools one has is either use sideways or curb the incoming flow to manageable levels. Similar management techniques are applied in say telecommunications traffic to deal with congestion such as for instance earthquakes.

The second sentence could be improved englishwise, i.e. line 44 ‘has been improved’: get rid of ‘been’. However the entire sentence, leaves a ‘so what’, which I presume is ‘so the number of cars has increased’. But this is well known and in any case covered in the next sentence. So this sentence can go.

 The 3rd sentence on line 44 ‘Because of the significant growth’ is especially confusing: Accidents, drunk driving, road condition and traffic violations  are typically NOT considered part of traffic management. Next, the authors mention that congestion is the most common issue, but the reader is left wondering how this is connected to accidents, drunk driving etc.

Next, autonomous driving systems are promoted as ‘eliminating’ some mysterious and unexplained ‘hidden dangers of traffic safety’ and also ‘improve safety’. What are hidden dangers and how are they distinct from improving safety.  And how is all this related to ‘sustainable intelligent transportation system’? What system?

I assume the authors mean something like this:

‘Autonomous driving systems are promoted as able to both improve safety (e.g. drunk/reckless driving or violations and improved decisions in bad road conditions) and make decisions that will alleviate congestion.’. If so  then this sentence  could replace the part of the paragraph from lines 43’With the rapid growth’-end of the paragraph, as it is both shorter and clearer. And it sets the stage that we are interested in autonomous vehicles and in what aspects specifically.

I also believe that the 1.1. Motivation section is misplaced and belongs here. Still, even in 1.1. autonomous vehicles are mixed with traffic engineers and law enforcement. So it is unclear where detection and tracking is done: In the (autonomous self-driving) vehicle to avoid accidents? In law enforcement traffic cameras? Or in say satellite images for traffic monitoring?

I would like to stress that although Table 1 is very useful, a quantification of accuracy is lacking. In fact metrics is very important and ‘acceptable’ accuracy is very much dependent on the application. For instance  for traffic management/control, I would imagine an 80-90% accuracy would be quite fine, but when autonomous vehicles are concerned and for safety  purposes, a single error may be unacceptably high. So ‘highly accurate’ is practically ambiguous and not very helpful

 

Second paragraph:

“Therefore, Vehicle recognition and tracking are necessary…”

There are two aspects here: First, for the actual driving  (and avoiding accidents), yes, but note that a limited range is required, e.g. vehicles within a certain distance which is related to speed and rection times. For avoiding congestion, it is by no means clear that this is required: First, because the sensors can only look a certain distance and second because  traffic information on a much larger scale can be reliably obtained from other sources, such as  sattelites, e.g. google maps provided the vehicle is network-connected to such info. So, as this is presented, it is far from clear what  exactly the authors have in mind and what the problems are. Hence I would recommend a rewrite  along the following lines:

“ In order to  (see lines 65-67)….. , vehicle recognition and tracking is essential. “ Then describe methods and problems with “vehicle recognition and tracking”.

 

Third paragraph:

 

 I already explained that traffic management is different from individual vehicle tracking. The ‘response to the above problems’ is unclear, but is clarified by the rest of the sentence ‘vehicle detection, tracking and counting’ .

 

 

So I would urge the authors to a) clarify what the mean and b) merge paragraphs 1-3 with  the Motivation subsection, explaining possible applications

 

Next, put the 1.2 Contribution subsection here, explaining what is new with this work. You an mention YOLO and SORT here and their advantages, as in lines 78-111

 

Section 2

 

line 205 “improved faster object detection”. Do the authors really want to keep both ‘improved’ and ‘faster’?

line 210 “The constraints” makes no sense. I assume the authors mean drawback?

Line 212: on THE one stage approach

The whole paragraph is not very clear, discussing the details before even discussing how YOLO works in this particular case. This is why an Appendix with example would be very helpful

Line 263: THE SORT algorithm

 

Line 274: ‘Existing Research’: Probably ‘Previous Works’ is a better title

Lines 285-286.’not able to produce better(than what?) results when it came to the viewing angle and natural environment’ is very unclear. Do the authors want to say that SVM was too sensitive to viewing angle and ‘natural environment’ (I assume this means background, but at any rate is very unclear and the author should clarify)

Line 291: ‘specific environments’ is unclear. An example could help.

Lines 300-301. Bad English. The authors probably mean ‘With recent rapid advances in machine learning, deep learning using CNNs is a hot and promising area for computer vision’.

 

Page 7:, Table 1 is very nice, but as alluded to before, ‘highly accurate’ is too vague and should be quantified, even giving a range

 

Subsection 2.5 is kind of repetitive, misses the issue of metrics discussed above and the points it makes are best placed in the Introduction, as discussed.

Section 3:

Line 383 ‘false detection’: Elaborate as to what this means (FP, FN and what could cause them)

Line 385 ‘this issue’. Which issue? Occlusions, flickering, blur, camera movement of false detection?

Fig 1. As discussed, I  believe  explaining how a vehicle is represented (e.g. a box with such coordinates) and what features are stored (color, .size,….) would mae the paper much more readable.

 

Line 402: ‘,fast speed than’-> and higher speed than

Line 406 The configuration of THE Darknet

Line 407: productive doe to THE smaller community OF users.

Line 409: higher detection accuracy: How much higher?

Line 437: ‘denotes the vehicle’s location’->notes the vehicle’s location (for that frame)

Line 438. THE methodological flow

 

Line  447: as discussed, the authors should discuss how the ‘data association’ is performed

 

The flowcharts and algorithm descriptions in 3.3 are poorly written. Symbols are not explained or poorly explained, e.g. covariance matrix (of what?) The authors should describe the variables in more detail AND make contact with the  problem at hand. For instance we have a frame with length X and height Y. The vehicle identified is …. The authors do give some details (see 3.4) but a clear, detailed , readable explanation is still missing.

 

Section 4.

FP, FN etc need to be further explained. Is for instance a FN the inability to identify a vehicle, to correctly bound it within a frame or to correctly track it?

Similarly, Table 5 should give example of what the precision, Recall and [email protected] mean:

If we have 100 vehicles, what does a precision, recall and map as stated mean?

Comments on the Quality of English Language

Moderate changes corrected in my detailed comments

Author Response

Thank you for the comments. Please check the uploaded rebuttal.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have addressed all my concerns, the paper is suitable for publication.

Reviewer 4 Report

Comments and Suggestions for Authors

The authors have improved the manuscript and addressed all issues raised by my original report. The paper can be published as it stands.

Back to TopTop