loading page

Multiscale Feature Fusion Network for Monocular Complex Hand Pose Estimation
  • Zhi Zhan,
  • Guang Luo
Zhi Zhan
Guangdong Engineering Polytechic
Author Profile
Guang Luo
South China Normal University

Corresponding Author:luoguang_arts@163.com

Author Profile

Abstract

Hand pose estimation based on a single RGB image has low accuracy due to the complexity of the pose, local self-similarity of finger features, and occlusion. A multiscale feature fusion network (MS-FF) for monocular vision gesture pose estimation is proposed to address this problem. The network can take full advantage of different channel information to enhance important gesture information, and it can simultaneously extract features from feature maps of different resolutions to obtain as much detailed feature information and deep semantic information as possible. The feature maps are merged to obtain the hand pose results. The InterHand2.6M dataset and Rendered Handpose Dataset (RHD) are used to train the MS-FF. Compared with the other methods (which can estimate interacting hand poses from a single RGB image), the MS-FF obtains the smallest average error of hand joints on RHD, verifying its effectiveness.
14 Oct 2023Submitted to Electronics Letters
16 Oct 2023Submission Checks Completed
16 Oct 2023Assigned to Editor
18 Oct 2023Reviewer(s) Assigned
02 Nov 2023Review(s) Completed, Editorial Evaluation Pending
03 Nov 2023Editorial Decision: Revise Major
12 Nov 20231st Revision Received
14 Nov 2023Submission Checks Completed
14 Nov 2023Assigned to Editor
14 Nov 2023Review(s) Completed, Editorial Evaluation Pending
14 Nov 2023Reviewer(s) Assigned
20 Nov 2023Editorial Decision: Accept