HTPE-Net: Monocular 6D Pose Estimation of Transparent Objects in Hand
for Robot Manipulation
- Ran Yu,
- Shoujie Li,
- Haixin Yu,
- Wenbo Ding
Shoujie Li
Tsinghua Shenzhen International Graduate School
Author ProfileHaixin Yu
Tsinghua Shenzhen International Graduate School
Author ProfileWenbo Ding
Tsinghua Shenzhen International Graduate School
Corresponding Author:ding.wenbo@sz.tsinghua.edu.cn
Author ProfileAbstract
Transparent objects are difficult to perceive due to their unique
optical properties, and the dynamic interaction between the hand and
object further complicates pose estimation. To address this problem, we
propose HTPE-Net, a monocular instance-level 6D pose estimation method
for hand-held transparent objects, addressing the significant challenges
posed by the texture-less, non-Lambertian surfaces, and hand-object
occlusions. HTPE-Net integrates hand and object features through a
dual-stream feature extraction backbone and a hand-to-object feature
enhancement module, generating geometric features and hand attention
maps to improve robustness to background changes and occlusions. The
network is trained on a modified version of the TransHand-14K dataset
and demonstrates superior performance compared to state-of-the-art
methods. Additionally, a sim-to-real experiment validates the practical
applicability of HTPE-Net in real-world robot perception tasks. The
proposed approach significantly advances the accuracy and robustness of
6D pose estimation for hand-held transparent objects, with potential
applications in robotics, human-machine interaction, and augmented
reality.02 Dec 2024Submitted to Journal of Field Robotics 03 Dec 2024Submission Checks Completed
03 Dec 2024Assigned to Editor
03 Dec 2024Review(s) Completed, Editorial Evaluation Pending
19 Dec 2024Reviewer(s) Assigned