loading page

HTPE-Net: Monocular 6D Pose Estimation of Transparent Objects in Hand for Robot Manipulation
  • +1
  • Ran Yu,
  • Shoujie Li,
  • Haixin Yu,
  • Wenbo Ding
Ran Yu
Tsinghua Shenzhen International Graduate School
Author Profile
Shoujie Li
Tsinghua Shenzhen International Graduate School
Author Profile
Haixin Yu
Tsinghua Shenzhen International Graduate School
Author Profile
Wenbo Ding
Tsinghua Shenzhen International Graduate School

Corresponding Author:ding.wenbo@sz.tsinghua.edu.cn

Author Profile

Abstract

Transparent objects are difficult to perceive due to their unique optical properties, and the dynamic interaction between the hand and object further complicates pose estimation. To address this problem, we propose HTPE-Net, a monocular instance-level 6D pose estimation method for hand-held transparent objects, addressing the significant challenges posed by the texture-less, non-Lambertian surfaces, and hand-object occlusions. HTPE-Net integrates hand and object features through a dual-stream feature extraction backbone and a hand-to-object feature enhancement module, generating geometric features and hand attention maps to improve robustness to background changes and occlusions. The network is trained on a modified version of the TransHand-14K dataset and demonstrates superior performance compared to state-of-the-art methods. Additionally, a sim-to-real experiment validates the practical applicability of HTPE-Net in real-world robot perception tasks. The proposed approach significantly advances the accuracy and robustness of 6D pose estimation for hand-held transparent objects, with potential applications in robotics, human-machine interaction, and augmented reality.
02 Dec 2024Submitted to Journal of Field Robotics
03 Dec 2024Submission Checks Completed
03 Dec 2024Assigned to Editor
03 Dec 2024Review(s) Completed, Editorial Evaluation Pending
19 Dec 2024Reviewer(s) Assigned