Fig. 5 Mean per joint position error of interacting hand sequences on various test sets.
Different methods were used to test interacting hand pictures. We averaged the error of the left and right hand joints as the error of each joint point. As seen in Fig. 5, it was more difficult to predict the joint points near the fingertips than those near the palm. For all joints, the average errors of our method were lower than those of the compared methods. Fig. 6 shows the hand pose estimation results of PoseNet, InterNet, and the MS-FF. Since most joints are flexible, occlusions will be present when gestures interact, so it is more complicated to estimate hand poses through a single RGB picture. As seen in Fig. 6, our results are better than the others.