Kalman Filtering and Bipartite Matching Based Super-Chained Tracker Model for Online Multi Object Tracking in Video Sequences
Round 1
Reviewer 1 Report
In this article, the authors propose a Super Chained Tracker (SCT) model, which is convenient and online and provides better results when compared with existing MOT methods. The proposed model comprises subtasks, object detection, feature manipulation, and representation learning, into one end-to-end solution. It takes adjacent frames as input, converting each frame into bounding boxes’ pairs and chaining them up with Intersection over Union (IoU), Kalman filtering, and bipartite matching.
The paper is well structured and well-written, but there are some aspects to improve, like:
- In section 2, when authors are talking about other works, it is crucial to show what are the main differences between the previous work and your proposal.
- Figures 4 and 5 have the same caption, is it a mistake? If not, you should join both figures in one.
- The same as before, but with figures 6 & 7, and 8 & 9.
- What is the delay of your proposal compared with other solutions? You require previous frames, so it could be a problem in a real-time scenario?
- A comparison of time, CPU required, etc. could be interesting in order to see more issues of your proposal and other solutions.
Author Response
Reviewer 1:
Comment
“The paper is well structured and well-written, but there are some aspects to improve, like:
- In section 2, when authors are talking about other works, it is crucial to show what are the main differences between the previous work and your proposal.”
Response:
We are thankful to the worthy Reviewer for encouraging us in our work and highlighting some important points to further improve the article. We have added the details as suggested in the referenced section. Please see modifications at:
(please see page 5, line 178 of the revised manuscript as yellow highlighted text)
Comment
“- Figures 4 and 5 have the same caption, is it a mistake? If not, you should join both figures in one. “
Response:
We modified the captions of Figures 4 and 5 according to the Reviewer’s comments. The modifications may please be reached at:
Figure 4: (please see page 13, line 399 of the revised manuscript as yellow highlighted text)
Figure 5: (please see page 14, line 402 of the revised manuscript as yellow highlighted text)
Comment
“- The same as before, but with figures 6 & 7, and 8 & 9”
Response:
We have modified the article in the light of the worthy Reviewer. The details may please be seen at:
Figure 6: (please see page 14, line 403 of the revised manuscript as yellow highlighted text)
Figure 7: (please see page 14, line 404 of the revised manuscript as yellow highlighted text)
Figure 8: (please see page 15, line 405 of the revised manuscript as yellow highlighted text)
Figure 9: (please see page 15, line 407 of the revised manuscript as yellow highlighted text)
Comment
“- What is the delay of your proposal compared with other solutions? You require previous frames, so it could be a problem in a real-time scenario? “
Response:
We are thankful to the worthy Reviewer for highlighting this important point to us. The interval between adjacent frames is defined by the dataset provider (https://motchallenge.net/data/MOT16/). Suppose there are 30 fps, then the tracking will be based on the information provided by the contiguous frames. Now, if the frame rate is increased (say 45 fps), and there exist some frames between 1/30 sec then the tracking results will be influenced by the extra frames, existing between the interval information. We have added the relevant detail in the article.
(please see page 2, line 63 of the revised manuscript as yellow highlighted text)
Comment
“- A comparison of time, CPU required, etc. could be interesting in order to see more issues of your proposal and other solutions”
Response:
We are indebted to the reviewer for sharing the potential improvements to make the article more valuable. The additions have been made, and may please be viewed at:
(please see page 12, line 375 of the revised manuscript as yellow highlighted text)
(please see page 17, line 454 of the revised manuscript as yellow highlighted text)
Reviewer 2 Report
The article addresses Kalman Filtering and Bipartite Matching based Super-Chained Tracker Model for online Multi Object Tracking in Video Sequences.
1- The abstract should be modified. The abstract can be rewritten shorter but with more details and some numerical results.
I suggest you structure your abstract as presented in https://www.principiae.be/pdfs/UGent-X-003-slideshow.pdf
2- The Literature Review was started with extra information about object tracking, where should be referred to some references. The section can be written shorter but with more focus on the related works. There are many paragraphs that can be refereed to some references.
3- Even though I understand that the results are scientifically sound, I recommend the author to do a major review on their text, which I believe can be summarized in benefit of the readers.
4- Some of the formulas should be explain better in the text for examples Eqs. 2 and 3.
5- Figures 2 and 3 should be explained in a better way. There are some details, which have not been explained in the article. Also, if they are copy from some other articles or webpages, it should be referenced.
6- Simulation scenarios are not well discussed. The approaches were illustrated only on some specific simulations, which is not enough to draw a complete and accurate conclusion about the method.
7- The evaluations are not enough. The proposed method should be compared with some well-known methods in this area.
8- Please, do not forget that the clarity and the good structure of an article are important factors in the review decision. Please read the paper carefully (again) and correct it in English.
Author Response
Reviewer 2:
Comment
“1- The abstract should be modified. The abstract can be rewritten shorter but with more details and some numerical results.
I suggest you structure your abstract as presented in https://www.principiae.be/pdfs/UGent-X-003-slideshow.pdf”
Response:
We are thankful to the worthy Reviewer for highlighting this important point to us. I have thoroughly gone through the referenced pdf file and made the changes accordingly for “Context, Need, Task, Object, Findings, and Conclusion Perspect”. The changes in the article may please be viewed at:
(please see page 1, line 27 of the revised manuscript as yellow highlighted text)
Comment
“2- The Literature Review was started with extra information about object tracking, where should be referred to some references. The section can be written shorter but with more focus on the related works. There are many paragraphs that can be refereed to some references”
Response:
We modified the Literature Review as guided by the worthy Reviewer. The modifications may please be reached at:
(please see page 3, line 76 of the revised manuscript as yellow highlighted text)
Comment
“3- Even though I understand that the results are scientifically sound, I recommend the author to do a major review on their text, which I believe can be summarized in benefit of the readers.”
Response:
We have summarized the text and thoroughly reviewed the article, and changes have been made accordingly in the following places:
(please see page 1, line 27 of the revised manuscript as yellow highlighted text)
(please see page 2, line 41 of the revised manuscript as yellow highlighted text)
(please see page 2, line 63 of the revised manuscript as yellow highlighted text)
(please see page 3, line 76 of the revised manuscript as yellow highlighted text)
(please see page 5, line 178 of the revised manuscript as yellow highlighted text)
(please see page 12, line 375 of the revised manuscript as yellow highlighted text)
(please see page 17, line 454 of the revised manuscript as yellow highlighted text)
Comment
“4- Some of the formulas should be explain better in the text for examples Eqs. 2 and 3.”
Response:
In Section 3.4.4 we have defined the loss function and detailed methodology of assigning labels, where matching of bounding boxes is discussed. In Equation 2, prediction is performed for
corresponding ground truth bounding boxes by defining the threshold. In Equation 3, focal losses are defined for both classification and prediction, where the delta symbol shows the regression and ground truth offsets with respect to their chained anchor. Therefore, loss for the paired boxes regression is equal to the L1 loss function. Further, the total loss for SCT is defined for solving the class imbalance problem.
Comment
“5- Figures 2 and 3 should be explained in a better way. There are some details, which have not been explained in the article. Also, if they are copy from some other articles or webpages, it should be referenced.”
Response:
A lot of papers are based on similar working principles whereas we have claimed the novice notion of our work diligently and nicely more than once at multiple locations in different tones. Figure 2 demonstrates the architecture of the proposed system. In this system, feature extraction is performed using RESNET50 and FPN, and then the features are concatenated later on and used for the prediction of bounding boxes. Figure 3 represents the memory mechanism for storing information of the current frame and reusing these features until the next node is processed.
Comment
“6- Simulation scenarios are not well discussed.
The approaches were illustrated only on some specific simulations, which is not enough to draw a complete and accurate conclusion about the method.”
Response:
Most respectfully, it is stated we have used the word “testing” as it is customary in deep learning instead of simulation. The results of the testing have been given both qualitatively (Section 4.1) and quantitatively (Section 4.2). The comparison of SCT with the results of other frameworks has been illustrated in Tables 1-4. Further, a brief and comprehensive temporal complexity analysis with implications has been incorporated with the results illustrated in Table 5.
Comment
“7- The evaluations are not enough. The proposed method should be compared with some well-known methods in this area.”
Response:
We are highly obliged to the worthy Reviewer for guiding us in this issue. We have added another section for comparison on the basis of time complexity. The modifications may please be viewed at:
(please see page 17, line 454 of the revised manuscript as yellow highlighted text)
(please see page 18, line 461 of the revised manuscript as yellow highlighted text)
Reviewer 3 Report
1. Lit review section 2 is too long, could you please brief
2. Mention Computation complexity of SCT (Compute, Storage, Memory, etc)
3. Mention Evaluation metrics around Error rate, the accuracy of SCT vs SOTA
Author Response
Reviewer 3:
Comment
“1. Lit review section 2 is too long, could you please brief “
Response:
We thankfully agree with the worthy Reviewer for pointing out this modification. We have thoroughly checked and the brief form of Literature Review may please be viewed at:
(please see page 3, line 76 of the revised manuscript as yellow highlighted text)
(please see page 4, line 110 of the revised manuscript as yellow highlighted text)
(please see page 6, line 178 of the revised manuscript as yellow highlighted text)
Comment
“2. Mention Computation complexity of SCT (Compute, Storage, Memory, etc)”
Response:
We have added a section for a comparison of SCT on the basis of FPS and computational resources. The modifications may please be viewed at:
(please see page 17, line 454 of the revised manuscript as yellow highlighted text)
(please see page 18, line 461 of the revised manuscript as yellow highlighted text)
Comment
“3. Mention Evaluation metrics around Error rate, the accuracy of SCT vs SOTA
Response:
Arrow symbols, in Tables 1-4, are used with evaluation metrics to indicate the error rates, as the down arrow indicates loss and the up arrow shows accuracy. The evaluation metrics related to the existing models have been shown in Tables 1-4. The associated details related to evaluation metrics has been presented in Section 3.5.
Round 2
Reviewer 1 Report
The authors have improved the paper according to the previous comments.
The only advice for the authors is to try to improve the quality of figures 1, 2, and 3. They could use a vector format.
Author Response
Reviewer 1:
Comment
“The authors have improved the paper according to the previous comments.”
Response:
We are indebted to the reviewer for encouraging us on our work.
Comment
“The only advice for the authors is to try to improve the quality of figures 1, 2, and 3. They could use a vector format.”
Response:
We have improved the quality of Figures 1, 2 and 3, and an eps and tif file versions have been attached.
Reviewer 2 Report
The quality of the figures 1, 2 and 3 are very low. They have to be changed with high quality figures.
Author Response
Reviewer 2:
Comment
“The quality of the figures 1, 2 and 3 are very low. They have to be changed with high quality figures.”
Response:
We are thankful to the Reviewer for pointing out the deficiency in the figures. The high quality versions of the figures have been attached as eps and tif file versions.