loading page

Convolutional Non-local Spatial-Temporal Learning for Multi-Modality Action Recognition
  • +2
  • Ziliang Ren,
  • Huaqiang Yuan,
  • Wenhong Wei,
  • Tiezhu Zhao,
  • Qieshi Zhang
Ziliang Ren
Dongguan University of Technology - City College

Corresponding Author:renzl@dgut.edu.cn

Author Profile
Huaqiang Yuan
Dongguan University of Technology
Author Profile
Wenhong Wei
Dongguan University of Technology
Author Profile
Tiezhu Zhao
Dongguan University of Technology
Author Profile
Qieshi Zhang
Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences
Author Profile

Abstract

Traditional deep convolutional networks (ConvNets) have shown that both RGB and depth are complementary for video action recognition. However, it is difficult to enhance the action recognition accuracy because of the limitation of the single ConvNets to extract the underlying relationship and complementary features between these two kinds of modalities. In this paper, we proposed a novel two stream ConvNet for multi-modality action recognition by joint optimization learning to extract global features from RGB and depth sequences. Specifically, a non-local multi-modality compensation block (NL-MMCB) is introduced to learn the semantic fusion features for the recognition performance. Experimental results on two multi-modality human action datasets, including NTU RGB+D 120 and PKU-MMD dataset, verify the effectiveness of our proposed recognition framework and demonstrate that the proposed NL-MMCB can learn complementary features and enhance the recognition accuracy.
05 Jul 2022Submitted to Electronics Letters
06 Jul 2022Submission Checks Completed
06 Jul 2022Assigned to Editor
18 Jul 2022Reviewer(s) Assigned
19 Jul 2022Review(s) Completed, Editorial Evaluation Pending
24 Jul 2022Editorial Decision: Revise Minor
29 Jul 20221st Revision Received
29 Jul 2022Submission Checks Completed
29 Jul 2022Assigned to Editor
29 Jul 2022Review(s) Completed, Editorial Evaluation Pending
02 Aug 2022Editorial Decision: Accept