Nowadays, gesture recognition has been sufficiently developed and used in Human-Computer Interaction, the traditional way is to use the wearable device such as digital gloves. With the development of computer vision, control and interaction by gesture can provide the users with a more intuitive understanding and make it easier for the operator to present. However, most methods are based on specific device such as depth sensor, which contributes to gesture recognition cannot be extensively implemented, using a normal RGB camera to track the hand and recognize the gesture becomes a significant issue. Model-driven methods is the most low-cost way. However, a problem with this method is the occurrence of jitter during the model matching process. This paper proposes a improved filter to eliminate jitter while gesture recognition and hand tracking for virtual model interaction based on a monocular camera.