Abstract
Human pose estimation based on heatmap regression has achieved
significant success in recent years. However, the semantic ambiguity
caused by traditional hand-crafted heatmaps seriously affects the model
performance. Specifically, hand-crafted heatmaps generated with a fixed
Gaussian kernel are semantically misaligned. Various Gaussian covered
areas for keypoints with the same type may cause model learning
confusion. In this paper, we focus on learnable heatmap generation and
propose a refined heatmap generator (RHG) to boost human pose
estimation. First, we propose a joint training framework to connect the
human pose estimator and RHG for end-to-end training. It employs a joint
loss function to learn intermediate representations of the network and
dataset. Second, RHG takes annotated dotpoints as input and utilizes
scale-aware heatmaps as regression targets to deal with the scale
variation. Scale-aware heatmaps are generated by adjusting Gaussian
covered areas with geometric priors. Experimental results show that our
method achieves 72.0%AP on COCO test-dev2017 and 74.0%AP on CrowdPose
dataset, respectively, outperforming state-of-the-art methods.