Next Article in Journal
Reduction in Drying Shrinkage and Efflorescence of Recycled Brick and Concrete Fine Powder–Slag-Based Geopolymer
Next Article in Special Issue
A Financial Time-Series Prediction Model Based on Multiplex Attention and Linear Transformer Structure
Previous Article in Journal
Use of Spatio-Structural Parameters of the Multiscan Video Signal for Improving Accuracy of Control over Object Geometric Parameters
Previous Article in Special Issue
An Attention-Based Method for Remaining Useful Life Prediction of Rotating Machinery
 
 
Article
Peer-Review Record

Deep Clustering Efficient Learning Network for Motion Recognition Based on Self-Attention Mechanism

Appl. Sci. 2023, 13(5), 2996; https://doi.org/10.3390/app13052996
by Tielin Ru 1,* and Ziheng Zhu 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2023, 13(5), 2996; https://doi.org/10.3390/app13052996
Submission received: 12 January 2023 / Revised: 17 February 2023 / Accepted: 21 February 2023 / Published: 26 February 2023

Round 1

Reviewer 1 Report

In this work, the author proposed a deep clustering learning network of motion recognition under the self-attention mechanism, which can effectively solve the problems of accuracy and efficiency of sports event analysis and judgment, and proved the effectiveness and feasibility of this method in sports video analysis and reasoning through experiments. In response to the following comments, my decision is to make major changes. If the following problems can be solved, I will suggest that the manuscript be published in this journal.

 

#Strength:

+(1) Through the LSTM, this network can not only solve the problem of gradient disappearance and explosion in the recurrent neural network (RNN), but also capture the internal correlation between multiple people in the sports field for identification, etc;

 

+(2) On the basis of (1), the DEC is added to integrate the motion coding information in key frames to improve the judgment efficiency;

 

+(3) With the self-attention mechanism, it can not only analyze the whole process of the whole sports video macroscopically but also focus on the specific attributes of the movement to capture more important details, extract the key posture features of athletes and further enhance the features, effectively reducing the parameters of the self-attention mechanism in the calculation process, reducing the computational complexity while maintaining the ability to capture details, Improve the accuracy and efficiency of reasoning and judgment. Through verification on large video datasets of mainstream sports, we have achieved high accuracy and improved the efficiency of detection and recognition.

 

#Weaknesses:

-(1) Please provide an example of the paper on the detection and recognition of multi-person behavior events, and describe the method of the paper.

 

-(2) In Figure 1, there should be a general title rather than a direct description.

 

-(3) In Tables 2 and 3, the experimental conclusion data of this paper should be bold.

 

-(4) Line 48, Literature[11,12]->The literature[11,12]

 

-(5) Line 210, , And iteratively->, and iteratively

-(6) Line 229, , At the same time->, at the same time

 

-(7) Line 323, ,M describes->, M describes

 

-(8) Line 325, ,->, 

 

-(9) Line 330, ,and->, and    .Finally->. Finally

 

Author Response

Reviewer 1:

(1)

This article describes the local group relationship analysis method for group activity identification. In this article, the author proposes a method to solve the problem of human group activity identification by using local group relationship. Specifically, instead of analyzing the motion of each human, we first group individual human objects into local groups to represent the relationships in the entire scene. By modeling each human motion and local group relationship, we can maximize the important motion information. The author uses the gated recursive cell model to deal with the trajectory information of any length with nonlinear hidden cells. In the experiment on the data set of public human group activities, the performance of the proposed method is compared with other competitive methods, and it is shown that the proposed algorithm is superior to other methods. At the end of the paper, a new feature descriptor is proposed to identify group activities in surveillance video. Multiple human objects are grouped into local groups, and their relationships are significantly considered. The GRU model captures the time dynamics of multiple relationships with different lengths. This paper proves that the proposed local group feature is effective by superior to competition method, and will further expand the proposed method to deal with appearance features and various scenes.

 

(2)-(9)

Corrected.

 

Reviewer 2 Report

This paper proposes a deep clustering learning network for motion recognition under the self-attention mechanism, which is used to solve the accuracy and efficiency problems of sports event analysis and judgment. This work is interesting. However, it requires significant improvements with respect to the following points.

#1- Please read the whole paper again and correct the possible typos. Some of them are highlighted in the attached file.

#2- The mathematical formulation of the considered problem is missing or unclear!

#3- The authors are encouraged to provide the pseudocode of the given method.

#4- All experimental parameters should be provided.

#5- Experimental results and their comparison is not seen with statistical evaluation.

#6- Some remarks on the main results would be necessary and helpful!

Comments for author File: Comments.pdf

Author Response

Reviewer 2:

(1)#1- Please read the whole paper again and correct the possible typos. Some of them are highlighted in the attached file.

Reply:Thank you very much for your suggestion.  We have reviewed the full paper and made changes.

(2) The mathematical formulation of the considered problem is missing or unclear!

Reply:Thank you very much for your suggestions. We have read a lot of references and combined with the methods of the literature to deduce the formula in this paper one by one again, and explained the meaning of each parameter, making the article fuller and smoother.

 

(3) The authors are encouraged to provide the pseudocode of the given method.

Reply:Thank you very much for your suggestion. We will supplement the pseudocode part of the method in this article and emphasize it to make the method in this article more clear and easy for readers to read.

 

(4) All experimental parameters should be provided.

Reply:Thank you very much for your suggestion. We have sorted out the experimental part again, explained the indicators of the experimental comparison, and supplemented the explanation of the formula principle in the original text to make the article more convincing and make the readers read more smoothly.

 

(5) Experimental results and their comparison is not seen with statistical evaluation.

Reply:Thank you very much for your suggestion. In the experiment part, we mainly focus on the work of motion detection and recognition, and from the simple pedestrian motion detection analogy to the more complex motion behavior detection, the targeted comparison indicators have been improved in the accuracy rate and recall rate.(Line 452)

 

(6) Some remarks on the main results would be necessary and helpful!

Reply:Thank you very much for your suggestion. For the main experimental results, we made necessary comments in the following discussion and conclusion, which is very important for the integrity of the article.

Reviewer 3 Report

This paper provides only very brief illustration of the proposed method, without details on specific architecture or design. There is no way to reproduce the method presented in this paper. 

The experiments are designed with very abnormal training settings. Misuse of key terms, such as automatic encoder, epochal, 100 times of training, etc, makes the study unreliable. More clarification is required.

The accuracies of the proposed method presented in Table 2 & 3 don't agree with the confusion matrix in Figure 8.  How are the accuracies for the benchmarking models obtained for Table 2 & 3? I will suggest the authors to provide the source of implementations for the benchmakring models.

Author Response

Reviewer 3:

(1) This paper provides only very brief illustration of the proposed method, without details on specific architecture or design. There is no way to reproduce the method presented in this paper. 

 

Reply: Thank you very much for your suggestion. In response to this problem, we have learned from some top visual publications and top-level conferences, and will supplement the experimental description of the method and carry out pseudocode annotation in the article(Line 362), in order to further ensure the reproducibility of the method we proposed. We put the code and data set in https://github.com/rtl2023/Behavior-detection.

 

(2) The experiments are designed with very abnormal training settings. Misuse of key terms, such as automatic encoder, epochal, 100 times of training, etc, makes the study unreliable. More clarification is required.

 

Reply: Thank you very much for your suggestion. As you said, we had misunderstandings in the use of key terms, and there were words or numbers missing. We rechecked the experimental part and corrected the errors in time. Some of the experimental descriptions were incorrectly stated. We revised them and invited Professor Chen to polish our paper.

 

(3) The accuracies of the proposed method presented in Table 2 & 3 don't agree with the confusion matrix in Figure 8.  How are the accuracies for the benchmarking models obtained for Table 2 & 3? I will suggest the authors to provide the source of implementations for the benchmakring models.

 

Reply: Thank you very much for your suggestion. We have conducted experiments on four data sets and checked the data in detail. The reproduction of the benchmark model is based on the official and third-party open source links or the experimental settings in the method paper.

 

Reference:

[1] Zhang, Yunhua, et al. "Audio-adaptive activity recognition across video domains." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

 

[2] Shu, Xiangbo, et al. "Hierarchical long short-term concurrent memory for human interaction recognition." IEEE transactions on pattern analysis and machine intelligence 43.3 (2019): 1110-1118.

 

[3] Ijaz, Momal, Renato Diaz, and Chen Chen. "Multimodal transformer for nursing activity recognition." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

 

[4] Kim, Dongkeun, et al. "Detector-free weakly supervised group activity recognition." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

 

[5] Han, Mingfei, et al. "Dual-AI: dual-path actor interaction learning for group activity recognition." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.

 

[6] Doshi, Keval, and Yasin Yilmaz. "Federated learning-based driver activity recognition for edge devices." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

 

[7] Vats, Arpita, and David C. Anastasiu. "Key point-based driver activity recognition." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

 

[8] https://github.com/rtl2023/Behavior-detection

Round 2

Reviewer 2 Report

The authors have replied to my comments properly. However, there are still many grammatical errors. Some of these grammatical errors are listed below, indicated by line number:

- Line 13, fusing-> using

- Line 17 , The-> . The

- Line 83 However,-> however,

- Line 44 , Researchers-> , researchers

- Line 95 , Improve-> , improve

- Line 120 ; By-> . By 

- Line 138 .In-> . In

- Line 158 , Finally,-> , finally,

- Line 175 , From-> . From

- Line 178 .In-> . In

- Line 201 .The-> . The

- Line 206 , The-> . The 

- Line 217 ,b1-> , b1

- Line 217 ,W1-> , W1

- Line 236 ,In-> . In

- Line 240 ,which-> , which

- Line 278 ; Channel-> . Channel

- Line 291 .A feature-> . A feature

- Line 299 Relu-> ??

- Line 321 ,where-> , where

- Line 321 ,Then-> , then

- Line 321 ,and-> , and

- Line 322 ,where-> , where

- Line 331 ,and-> , and

- Line 335 ,In-> . In

- Line 340 :Concat()-> : Concat()

- Line 347 .At-> . At 

- Line 352 .E-> . E

- Line 355 self- attention-> self-attention

- Line 367 2.40GHz-> 2.40 GHz

- Line 367 8GB-> 8 GB

- Line 369 .The-> . The

- Line 372  118.8GB->  118.8 GB

- Line 372 ,including-> , including

- Line 385 set 0.001-> set to 0.001

- Line 431 Figure 6:-> Figure 6.

- Line 450 .We->. We

- Line 452 ,the-> , the

- Line 453 Figure 7:-> Figure 7.

- Line 466 , By-> . By

- Line 486 , This-> . This

- Line 490 ; In order to-> . In order to

- Line 502 ; The-> . The

Author Response

Dear Editors and Reviewers,

Thank you for your letter and for the reviewers’ comments. Those comments are all valuable and very helpful for revising and improving our paper and also are very important for guiding our research. We have studied the comments carefully and have made corrections which we hope meet with approval. Revised parts are marked  in the new manuscript.

 

#Response to Reviewers

 

Reviewer 2:

- Line 13, fusing-> using

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 17 , The-> . The

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 83 However,-> however,

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 44 , Researchers-> , researchers

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 95 , Improve-> , improve

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 120 ; By-> . By 

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 138 .In-> . In

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 158 , Finally,-> , finally,

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 175 , From-> . From

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 178 .In-> . In

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 201 .The-> . The

Thank you very much for your suggestion. We have carefully corrected it.

- Line 206 , The-> . The 

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 217 ,b1-> , b1

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 217 ,W1-> , W1

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 236 ,In-> . In

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 240 ,which-> , which

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 278 ; Channel-> . Channel

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 291 .A feature-> . A feature

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 299 Relu-> ??

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 321 ,where-> , where

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 321 ,Then-> , then

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 321 ,and-> , and

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 322 ,where-> , where

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 331 ,and-> , and

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 335 ,In-> . In

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 340 :Concat()-> : Concat()

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 347 .At-> . At 

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 352 .E-> . E

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 355 self- attention-> self-attention

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 367 2.40GHz-> 2.40 GHz

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 367 8GB-> 8 GB

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 369 .The-> . The

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 372  118.8GB->  118.8 GB

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 372 ,including-> , including

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 385 set 0.001-> set to 0.001

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 431 Figure 6:-> Figure 6.

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 450 .We->. We

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 452 ,the-> , the

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 453 Figure 7:-> Figure 7.

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 466 , By-> . By

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 486 , This-> . This

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 490 ; In order to-> . In order to

Thank you very much for your suggestion. We have carefully corrected it.

 

- Line 502 ; The-> . The

Thank you very much for your suggestion. We have carefully corrected it.

 

Reviewer 3 Report

The response from the authors still did not provide the requested design and experiment details.

1) The pseudo code added by the authors does not provide the details on network design such as the design information of the network layers in the modules of the proposed method. Such information is critical for understanding the design of the proposed method.

2) The added references for the benchmarking methods do not link to official/public code implementations, just paper publications. The github repo of the authors' method links to emotion/personality analysis task, which are not on motion recognition presented in the manuscript.

Author Response

Dear Editors and Reviewers,

Thank you for your letter and for the reviewers’ comments. Those comments are all valuable and very helpful for revising and improving our paper and also are very important for guiding our research. We have studied the comments carefully and have made corrections which we hope meet with approval. Revised parts are marked in the new manuscript.

Reviewer 3

The response from the authors still did not provide the requested design and experiment details.

 

Thank you very much for your suggestion. We have carefully read and checked the experimental part of the original text again. The detailed implementation steps can be reflected in the overall experimental flow chart, and we have made corresponding explanations in the "network model performance comparison module", providing some details to facilitate readers to read and understand.

 

1)、The pseudo code added by the authors does not provide the details on network design such as the design information of the network layers in the modules of the proposed method. Such information is critical for understanding the design of the proposed method.

 

Thank you very much for your suggestion. We have carefully read and analyzed the pseudocode part again. We will continue to supplement the design details of the network in the pseudocode part for the readers to understand. We will simply add the following details here: first, to solve the problem of gradient disappearance, introduce the gradient propagation model of the corresponding cell memory state ; Secondly, the information is loaded into the long-term memory unit, and the problem of activation function saturation is solved through the forgetting gate; Then the memory is selected for output through the output gate to solve the problem of reducing the controllability of the gate unit; Finally, the hidden state  is introduced to transform the simple feedback structure of the neural network into the fuzzy history memory structure, and the construction is successful.

 

 

2)、The added references for the benchmarking methods do not link to official/public code implementations, just paper publications. The github repo of the authors' method links to emotion/personality analysis task, which are not on motion recognition presented in the manuscript.

 

Thank you very much for your suggestion. We checked the code part again. Because of the large amount of code in the folder, it is easy to be confused. We will reorganize the code and upload it. And we have learned from many top journals and selected some representative papers for introduction and elaboration. Due to the copyright problem, we have not placed the relevant code, It is an introduction in the form of references. A deeper understanding can enter the original text, click and jump, and read.

The link is as follows: https://github.com/rtl2023/detection

Round 3

Reviewer 3 Report

The authors have answered to the queries.

Back to TopTop