Karmick Surana

and 1 more

Theater performers express emotions while acting through a variety of facial expressions and vocal techniques. To improve their skills or while preparing for performances, actors rehearse their scripts and practice their facial expressions and dialogue delivery. Computer vision tools can be utilized to detect emotions from facial expressions. Audio processing tools can be used to analyze vocal characteristics like tone, pitch, loudness, etc providing insights into the emotional delivery of actors through their speech. This research focuses on developing an application that combines computer vision and audio analysis for emotion recognition, aimed at helping actors enhance their skills and performances. The emotion detection tool utilizes deep learning models to predict emotions based on actors’ facial expressions and speech. The emotion conveyed by the user through facial expressions is predicted by utilizing Mediapipe’s FaceMesh model and Blendshapes. A pre-trained wav2vec2 based audio emotion classification model predicts the emotion conveyed in the user’s speech. The emotions detected in the user’s video and speech are logged into a PDF report which also contains a customized list of links of tutorials which a particular user could learn from to improve their acting skills. The proposed emotion recognition system was tested with 20 volunteers for detecting emotions in both video and audio data. 65% of the participants confirmed the system accurately identified their facial expressions, and 75% found it useful for assessing their dialogue delivery. Participants appreciated the immediate feedback and detailed PDF reports, which proved to be helpful for them to identify their mistakes and enhance their acting skills. This multimodal approach shows promise as a valuable resource for beginner actors trying to improve their skills.